POST/v1/chat/completions

Chat Completions

Create a model response for a given conversation. Accepts a list of messages and returns the model's next reply. Use this endpoint to power chat interfaces, AI assistants, content generation, and any text workflow.

This page documents the OpenAI-compatible Chat Completions endpoint. Use the openai Python package or any OpenAI-compatible SDK with OPENAI_BASE_URL=https://api.linkharbor.ai/v1. For Anthropic Messages API, use /anthropic/v1/messages instead.

curl https://api.linkharbor.ai/v1/chat/completions \

-H "Content-Type: application/json" \

-H "Authorization: Bearer $OPENAI_API_KEY" \

-d '{

"model": "your-model-name",

"messages": [

{"role": "system", "content": "You are a helpful assistant."},

{"role": "user", "content": "What is the capital of France?"}

"temperature": 0.7

Request Body

modelstringrequired

The ID of the model to use. Retrieve available models from GET /v1/models and replace your-model-name with a real ID from the live catalog.

messagesarrayrequired

A list of messages that make up the conversation. The model uses this history to generate the next reply.

messages[].rolestringrequired

The role of the message author. One of: system (sets assistant behavior), user (human input), or assistant (prior model replies).

messages[].contentstringrequired

The text content of the message.

streambooleanoptionalDefault: false

If true, the response is streamed back as Server-Sent Events (SSE) instead of a single JSON object. Each chunk contains a delta with the incremental content. The stream ends with data: [DONE].

temperaturenumberoptionalDefault: 1 · Range: 0–2

Sampling temperature. Higher values (e.g. 0.9) produce more creative, varied output. Lower values (e.g. 0.2) make responses more focused and deterministic. Adjust this or top_p, not both.

max_tokensintegeroptional

Maximum number of tokens to generate. The total of input tokens and this value cannot exceed the model's context window. Omit to use the model's default maximum.

Request body

{

"model": "your-model-name",

"messages": [

{"role": "system", "content": "You are a helpful assistant."},

{"role": "user", "content": "Explain async/await in Python."}

"temperature": 0.7,

"max_tokens": 512

}

Response

Returns a chat completion object. On success, the HTTP status is 200. On error, a JSON object with error type and message is returned instead.

idstring

Unique identifier for this completion, prefixed with chatcmpl-.

modelstring

The model that generated this completion.

choicesarray

Array of generated choices. Each contains a message with role and content, and a finish_reason (e.g. stop when the model completes naturally, length when max_tokens is reached).

usageobject

Token usage statistics for this request.

usage.prompt_tokensinteger

Number of tokens in the input messages.

usage.completion_tokensinteger

Number of tokens in the generated response.

usage.total_tokensinteger

Total tokens used (prompt + completion).

Response object

{

"id": "chatcmpl-A9f3k2mX8y",

"object": "chat.completion",

"created": 1715284800,

"model": "your-model-name",

"choices": [

{

"index": 0,

"message": {

"role": "assistant",

"content": "async/await lets you write asynchronous code..."

"finish_reason": "stop"

}

"usage": {

"prompt_tokens": 28,

"completion_tokens": 94,

"total_tokens": 122

}

Chat Completions

Request Body

modelstringrequired

The ID of the model to use. Retrieve available models from GET /v1/models and replace your-model-name with a real ID from the live catalog.

messagesarrayrequired

A list of messages that make up the conversation. The model uses this history to generate the next reply.

messages[].rolestringrequired

The role of the message author. One of: system (sets assistant behavior), user (human input), or assistant (prior model replies).

messages[].contentstringrequired

The text content of the message.

streambooleanoptionalDefault: false

If true, the response is streamed back as Server-Sent Events (SSE) instead of a single JSON object. Each chunk contains a delta with the incremental content. The stream ends with data: [DONE].

temperaturenumberoptionalDefault: 1 · Range: 0–2

Sampling temperature. Higher values (e.g. 0.9) produce more creative, varied output. Lower values (e.g. 0.2) make responses more focused and deterministic. Adjust this or top_p, not both.

max_tokensintegeroptional

Maximum number of tokens to generate. The total of input tokens and this value cannot exceed the model's context window. Omit to use the model's default maximum.

Response

Returns a chat completion object. On success, the HTTP status is 200. On error, a JSON object with error type and message is returned instead.

idstring

Unique identifier for this completion, prefixed with chatcmpl-.

modelstring

The model that generated this completion.

choicesarray

Array of generated choices. Each contains a message with role and content, and a finish_reason (e.g. stop when the model completes naturally, length when max_tokens is reached).

usageobject

Token usage statistics for this request.

usage.prompt_tokensinteger

Number of tokens in the input messages.

usage.completion_tokensinteger

Number of tokens in the generated response.

usage.total_tokensinteger

Total tokens used (prompt + completion).