OpenAI Chat completion

Creates a model response for the given chat conversation. This endpoint follows the OpenAI Chat Completion specification and forwards requests to the Azure OpenAI endpoint.

Endpoint POST https://api.langdock.com/openai/{region}/v1/chat/completions

In dedicated deployments, api.langdock.com maps to /api/public

Authentication

  • Header: Authorization

  • Value: Bearer YOUR_API_KEY

Supported Models Currently supported models include: gpt-5, gpt-5-mini, gpt-5-nano, gpt-5-chat, gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, o4-mini, o3, o3-mini, o1, o1-mini, o1-preview, gpt-4o, gpt-4o-mini

Note: If you use your own API keys in Langdock (BYOK), available models may differ — contact your admin.

Limits and unsupported parameters

  • Not supported: n, service_tier, parallel_tool_calls, stream_options

  • Each model has its own rate limit (workspace-level)

  • Default rate limit for this Chat Completion endpoint: 500 RPM (requests per minute) and 60,000 TPM (tokens per minute)

  • Exceeding limits returns 429 Too Many Requests

  • For higher limits contact [email protected]

Try it — Examples

curl --request POST \
  --url https://api.langdock.com/openai/{region}/v1/chat/completions \
  --header 'Authorization: <authorization>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "gpt-4o-mini",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "Write a short poem about cats."
    }
  ]
}
'

Response example (200)

Parameters

Headers

  • Authorization (string, required): API key as Bearer token. Format "Bearer YOUR_API_KEY"

Path parameters

  • region (string, required): The region of the API to use. Available: eu, us

Body (application/json)

  • model (string, required): ID of the model to use.

  • messages (array, required): A list of messages comprising the conversation so far. Minimum length: 1. Message roles: system, user, assistant, tool, function.

    • message fields:

      • role (enum, required): e.g., system, user, assistant

      • content (string, required for messages that contain text)

      • name (string, optional): an optional name for the participant

  • max_tokens (integer, optional): Maximum number of tokens to generate.

  • temperature (number, optional, default 1): 0–2

  • top_p (number, optional, default 1): 0–1

  • frequency_penalty (number, default 0): -2.0 to 2.0

  • presence_penalty (number, default 0): -2.0 to 2.0

  • logit_bias (object): Map of token IDs to bias values (-100 to 100)

  • stop (string or array, optional): Up to 4 sequences where generation will stop

  • stream (boolean, optional, default false): If true, partial tokens are sent as server-sent events terminated by data: [DONE]

  • response_format (object, optional): { "type": "text" } or { "type": "json_object" } — JSON mode requires you to instruct the model to output JSON

  • seed (integer, optional, Beta): For best-effort deterministic sampling

  • user (string, optional): Unique identifier representing your end-user

  • tools (array of objects, optional): List of tools (functions) the model may call (max 128). Each tool: { type: "function", function: { ... } }

  • tool_choice (enum or object, optional): Controls tool-calling behavior. Options: none, auto, required, or specify a particular tool. Default: none when no tools present; auto if tools are present.

Deprecated / replaced fields

  • function_call (deprecated): Replaced by tool_choice

  • functions (deprecated): Replaced by tools

Details and notable behaviors

  • response_format.type: "text" or "json_object". When using "json_object", you must instruct the model to produce JSON in the conversation messages to avoid problematic behavior (e.g., streaming whitespace).

  • logprobs and top_logprobs: If logprobs is true, you can request top_logprobs (0–20) to get token probability info.

  • Tools/functions: Provide a JSON Schema in parameters for functions. Omitting parameters defines an empty parameter list.

Expandable: Full schema and field details

chevron-rightShow/hide full request and response field detailshashtag

(Fields described above, plus nested attributes such as logit_bias.{key}, functions[].parameters as JSON Schema, usage.* fields in the response, system_fingerprint, finish_reason values, etc.)

finish_reason possible values:

  • stop

  • length

  • tool_calls

  • content_filter

  • function_call (deprecated)

Response fields:

  • id (string): Unique identifier for the chat completion

  • object (string): "chat.completion"

  • created (integer): Unix timestamp (seconds)

  • model (string)

  • choices (array): One or more choice objects, each with index, message, finish_reason, logprobs

  • usage (object): { completion_tokens, prompt_tokens, total_tokens }

  • system_fingerprint (string): Fingerprint of backend config (use with seed to monitor determinism)

Rate limits

  • Default for this endpoint: 500 RPM and 60,000 TPM (workspace-level)

  • Each model may have its own limits

  • Exceeding limits returns HTTP 429

  • For higher limits contact [email protected]

Using OpenAI-compatible libraries Because the request and response formats match OpenAI's API, you can use OpenAI-compatible libraries such as:

  • OpenAI Python library (openai-python)

  • Vercel AI SDK

Notes

  • Admins can create API keys in workspace settings.

  • If you use BYOK (bring-your-own-keys), model availability may differ — contact your admin.

Relevant links

  • OpenAI Chat Completion spec: https://platform.openai.com/docs/api-reference/chat/create

  • OpenAI models compatibility table: https://platform.openai.com/docs/models/model-endpoint-compatibility

  • Function calling guide: https://platform.openai.com/docs/guides/function-calling

  • Token counting example: https://cookbook.openai.com/examples/how_to_count_tokens_with_tiktoken