Codestral

Creates a code completion using the Codestral model from Mistralarrow-up-right. All parameters from the Mistral fill-in-the-middle Completion endpointarrow-up-right are supported according to the Mistral specifications.

Example — cURL

curl --request POST \
  --url https://api.langdock.com/mistral/{region}/v1/fim/completions \
  --header 'Authorization: <authorization>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "codestral-2501",
  "prompt": "function removeSpecialCharactersWithRegex(str: string) {",
  "max_tokens": 64
}
'

Example response (200)

{
  "data": "asd",
  "id": "245c52bc936f53ba90327800c73d1c3e",
  "object": "chat.completion",
  "model": "codestral",
  "usage": {
    "prompt_tokens": 16,
    "completion_tokens": 102,
    "total_tokens": 118
  },
  "created": 1732902806,
  "choices": [
    {
      "index": 0,
      "message": {
        "content": "\n  // Use a regular expression to match any non-alphanumeric character and replace it with an empty string\n  return str.replace(/[^a-zA-Z0-9]/g, '');\n}\n\n// Test the function\nconst inputString = \"Hello, World! 123\";\nconst outputString = removeSpecialCharactersWithRegex(inputString);\nconsole.log(outputString); // Output: \"HelloWorld123\"",
        "prefix": false,
        "role": "assistant"
      },
      "finish_reason": "stop"
    }
  ]
}

Rate limits

The rate limit for the FIM Completion endpoint is 500 RPM (requests per minute) and 60.000 TPM (tokens per minute). Rate limits are defined at the workspace level — not at an API key level. Each model has its own rate limit. If you exceed your rate limit, you will receive a 429 Too Many Requests response.

Please note that the rate limits are subject to change; refer to this documentation for the most up-to-date information. In case you need a higher rate limit, please contact us at [email protected]envelope.


Using the Continue AI Code Assistant

Using the Codestral model, combined with chat completion models from the Langdock API, makes it possible to use the open-source AI code assistant Continue (continue.dev)arrow-up-right fully via the Langdock API. Continue is available as a VS Code extensionarrow-up-right and as a JetBrains extension.

To customize the models used by Continue, edit the configuration file at ~/.continue/config.json (macOS / Linux) or %USERPROFILE%\.continue\config.json (Windows). Example setup using Codestral for autocomplete and other models for chats/edits:


Endpoint

POST /mistral/{region}/v1/fim/completions

Try it with the example cURL shown above.


Headers

Authorization (string) — required API key as Bearer token. Format: "Bearer YOUR_API_KEY"


Path parameters

region (string, required) The region of the API to use.

Available options:

  • eu


Body (application/json)

model (string) — required, default: codestral-2501 ID of the model to use. Only compatible for now with:

  • codestral-2501

prompt (string) — required The text/code to complete.

temperature (number) What sampling temperature to use; recommended between 0.0 and 0.7. Higher values (e.g., 0.7) make output more random; lower values (e.g., 0.2) make it more focused/deterministic. We generally recommend altering this or top_p, but not both. The default value varies by model. Call the /models endpoint to retrieve the appropriate default.

Required range: 0 <= x <= 1.5

top_p (number) — default: 1 Nucleus sampling: the model considers tokens comprising the top top_p probability mass. We generally recommend altering this or temperature, but not both.

Required range: 0 <= x <= 1

max_tokens (integer) Maximum number of tokens to generate in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.

Required range: x >= 0

stream (boolean) — default: false Whether to stream back partial progress. If set, tokens are sent as data-only server-side events as they become available, with the stream terminated by a data: [DONE] message. Otherwise, the server returns the full result as JSON when complete.

stop (string | string[]) Stop generation if this token is detected. Or provide an array of tokens.

random_seed (integer) The seed to use for random sampling. If set, different calls will generate deterministic results.

Required range: x >= 0

suffix (string) — default: "" Optional text/code that adds more context for the model. When given both a prompt and a suffix, the model will fill what is between them. When suffix is not provided, the model will simply execute completion starting with prompt.

min_tokens (integer) The minimum number of tokens to generate in the completion.

Required range: x >= 0


Response (200 — application/json)

Successful response fields:

  • model (string) — Example: "mistral-small-latest"

  • id (string) — Example: "cmpl-e5cc70bb28c444948073e77776eb30ef"

  • object (string) — Example: "chat.completion"

  • usage (object) — required

    • usage.prompt_tokens (integer) — Example: 16

    • usage.completion_tokens (integer) — Example: 34

    • usage.total_tokens (integer) — Example: 50

  • choices (array of ChatCompletionChoice objects)

    • index (integer) — Example: 0

    • message (object) — contains the assistant's generated content

    • finish_reason (string enum) — Available: stop, length, model_length, error, tool_calls. Example: "stop"

  • created (integer) — Example: 1702256327


Was this page helpful?

Yes / No

Responses are generated using AI and may contain mistakes.