Docs
Documentation for the ARK Platform / Introduction.
Ark API is compatible with OpenAI
Currently you can use /chat/completions
, /embeddings
and audio/transcriptions
endpoints in the same way you would use OpenAI's endpoints. You can even use their client libraries by customizing base_url
parameter. There are some limitations and extensions when comparing Ark API to OpenAI's.
Limitations
Model names aliasing
Ark runs inference on open-weight models such as Meta's Llama instead of closed models such as ChatGPT 4o.
Because openai
library (and possibly others similar) validates model names against a predefined enum, Ark API configuration implements ability to assign aliases to model names. Thus, for instance, you can have gpt-3.5-turbo
automatically replaced with meta-llama/Llama-3.1-8B-Instruct
, gpt-4o
with meta-llama/Llama-3.1-70B-Instruct
, text-embedding-ada-002
with BAAI/bge-m3
and whisper-1
with whisper-1
. It makes it easier to use libraries that expect OpenAI model names. Please refer to the information received from our Deployment Team for the table of available models along with their aliases.
Unsupported or not fully supported OpenAI parameters
/chat/completions
frequency_penalty
- Penalizing repeated tokens is not supported.function_call
- Explicit function calls, are not available.logit_bias
- Biasing token probabilities is not implemented.logprobs
- Token log probabilities are not available.presence_penalty
- Adjusting the likelihood of introducing new tokens is not available.response_format
- Only text output is supported; JSON and other formats are not available.seed
- Random seed control for reproducibility is not supported.stop
- Instead of arbitrary string-based stop sequences, this implementation relies on eos_token_id.temperature
- Currently, setting temperature=0 is not truly deterministic (but close). Instead, the value is internally set to 0.0001 to avoid numerical issues.tools & tool_choice
- Function calling and tool integration are not implemented.top_p
- Nucleus sampling is not implemented in this replacement.user
– The user parameter, which allows tracking requests per user, is not supported.
/embeddings
dimensions
– While dimensions can be specified, they must be within the model's predefined limits, and arbitrary dimension settings are not supported.encoding_format
– Only float encoding is supported; base64 encoding is not available.user
– The user parameter, which allows tracking requests per user, is not supported.
/audio/transcriptions
prompt
- Custom prompting is currently unsupported.
Extensions
Custom parameters
/chat/completions
ark_simplified
- When using streaming, set this totrue
to disable wrapping every single token in a full JSON. SSE events payload will then only contain the token itself. Please note that at the end of inference you will still receive token usage JSON and[Done]
.
/embeddings
- This endpoint currently has no Ark extensions.
Stateful processing
During inference, a rich internal state is built inside the GPU memory which represents current prompt, history of messages but also reasoning done by the model. OpenAI optimizes by processing every single request on randomly selected GPUs - but in the process most of the state is lost because only the final assistant reply is kept. Ark allows users to have a session during which all requests are processed on the same set of GPUs and the full internal state is maintained between requests. Depending on use case, this approach can improve both model's response quality and performance.
Please note that this mechanism can be globally enabled or disabled on your setup - consult information from our Deployment Team to know if you have this feature available.
To use this mechanism, simply enable cookie support in your client. The API responds with set-cookie: ark_session_id=${SESSION_UUID}; Max-Age=86400; Path=/; SameSite=lax
and if you send a cookie: ark_session_id=${SESSION_UUID}
with subsequent requests, the session will be reused.
Please note that there are timeouts configured which destroy inactive sessions after some time, to prevent blocking GPUs indefinitely. Refer to information from our Deployment Team about these timeouts.
Prerequisites
- Make sure you've obtained API URL and API Key from our Deployment Team.
- Install Python 3. Most Linux distributions come with Python preinstalled.
- Create a working directory, virtual Python environment and install dependencies:
- mkdir arkcd arkpython -m venv .venvsource .venv/bin/activatepip install openai # all examplespip install numpy # some examplespip install requests # some examples