Docs

Documentation for the ARK Platform / Introduction.

Ark API is compatible with OpenAI

Currently you can use /chat/completions, /embeddings and audio/transcriptions endpoints in the same way you would use OpenAI's endpoints. You can even use their client libraries by customizing base_url parameter. There are some limitations and extensions when comparing Ark API to OpenAI's.

Limitations

Model names aliasing

Ark runs inference on open-weight models such as Meta's Llama instead of closed models such as ChatGPT 4o.

Because openai library (and possibly others similar) validates model names against a predefined enum, Ark API configuration implements ability to assign aliases to model names. Thus, for instance, you can have gpt-3.5-turbo automatically replaced with meta-llama/Llama-3.1-8B-Instruct, gpt-4o with meta-llama/Llama-3.1-70B-Instruct, text-embedding-ada-002 with BAAI/bge-m3 and whisper-1 with whisper-1. It makes it easier to use libraries that expect OpenAI model names. Please refer to the information received from our Deployment Team for the table of available models along with their aliases.

Unsupported or not fully supported OpenAI parameters

/chat/completions

  • frequency_penalty - Penalizing repeated tokens is not supported.
  • function_call - Explicit function calls, are not available.
  • logit_bias - Biasing token probabilities is not implemented.
  • logprobs - Token log probabilities are not available.
  • presence_penalty - Adjusting the likelihood of introducing new tokens is not available.
  • response_format - Only text output is supported; JSON and other formats are not available.
  • seed - Random seed control for reproducibility is not supported.
  • stop - Instead of arbitrary string-based stop sequences, this implementation relies on eos_token_id.
  • temperature - Currently, setting temperature=0 is not truly deterministic (but close). Instead, the value is internally set to 0.0001 to avoid numerical issues.
  • tools & tool_choice - Function calling and tool integration are not implemented.
  • top_p - Nucleus sampling is not implemented in this replacement.
  • user – The user parameter, which allows tracking requests per user, is not supported.

/embeddings

  • dimensions – While dimensions can be specified, they must be within the model's predefined limits, and arbitrary dimension settings are not supported.
  • encoding_format – Only float encoding is supported; base64 encoding is not available.
  • user – The user parameter, which allows tracking requests per user, is not supported.

/audio/transcriptions

  • prompt - Custom prompting is currently unsupported.

Extensions

Custom parameters

/chat/completions

  • ark_simplified - When using streaming, set this to true to disable wrapping every single token in a full JSON. SSE events payload will then only contain the token itself. Please note that at the end of inference you will still receive token usage JSON and [Done].

/embeddings

  • This endpoint currently has no Ark extensions.

Stateful processing

During inference, a rich internal state is built inside the GPU memory which represents current prompt, history of messages but also reasoning done by the model. OpenAI optimizes by processing every single request on randomly selected GPUs - but in the process most of the state is lost because only the final assistant reply is kept. Ark allows users to have a session during which all requests are processed on the same set of GPUs and the full internal state is maintained between requests. Depending on use case, this approach can improve both model's response quality and performance.

Please note that this mechanism can be globally enabled or disabled on your setup - consult information from our Deployment Team to know if you have this feature available.

To use this mechanism, simply enable cookie support in your client. The API responds with set-cookie: ark_session_id=${SESSION_UUID}; Max-Age=86400; Path=/; SameSite=lax and if you send a cookie: ark_session_id=${SESSION_UUID} with subsequent requests, the session will be reused.

Please note that there are timeouts configured which destroy inactive sessions after some time, to prevent blocking GPUs indefinitely. Refer to information from our Deployment Team about these timeouts.

Prerequisites

  1. Make sure you've obtained API URL and API Key from our Deployment Team.
  2. Install Python 3. Most Linux distributions come with Python preinstalled.
  3. Create a working directory, virtual Python environment and install dependencies:
  • Copy
    mkdir ark
    cd ark
    python -m venv .venv
    source .venv/bin/activate
    pip install openai # all examples
    pip install numpy # some examples
    pip install requests # some examples