Create a model response for a given conversation. Accepts a list of messages and returns the model's next reply. Use this endpoint to power chat interfaces, AI assistants, content generation, and any text workflow.
This page documents the OpenAI-compatible Chat Completions endpoint. Use the openai Python package or any OpenAI-compatible SDK with OPENAI_BASE_URL=https://api.linkharbor.ai/v1. For Anthropic Messages API, use /anthropic/v1/messages instead.
The ID of the model to use. Retrieve available models from GET /v1/models and replace your-model-name with a real ID from the live catalog.
A list of messages that make up the conversation. The model uses this history to generate the next reply.
The role of the message author. One of: system (sets assistant behavior), user (human input), or assistant (prior model replies).
The text content of the message.
If true, the response is streamed back as Server-Sent Events (SSE) instead of a single JSON object. Each chunk contains a delta with the incremental content. The stream ends with data: [DONE].
Sampling temperature. Higher values (e.g. 0.9) produce more creative, varied output. Lower values (e.g. 0.2) make responses more focused and deterministic. Adjust this or top_p, not both.
Maximum number of tokens to generate. The total of input tokens and this value cannot exceed the model's context window. Omit to use the model's default maximum.
Returns a chat completion object. On success, the HTTP status is 200. On error, a JSON object with error type and message is returned instead.
Unique identifier for this completion, prefixed with chatcmpl-.
The model that generated this completion.
Array of generated choices. Each contains a message with role and content, and a finish_reason (e.g. stop when the model completes naturally, length when max_tokens is reached).
Token usage statistics for this request.
Number of tokens in the input messages.
Number of tokens in the generated response.
Total tokens used (prompt + completion).