Version: 1.4

Prompting

For prompting Continuum you can follow the OpenAI Chat API specification. We like to highlight that we don't use any OpenAI services but only adhere to their interface definitions. For sending prompts, simply use the continuum proxy as your endpoint. It'll take care of end-to-end encryption with our GenAI services for you.

caution

Don’t send prompts directly to https://api.ai.confidential.cloud/v1/chat/completions! It won’t work anyway. Always send your prompts to your proxy, which handles encryption and communicates with the actual GenAI endpoint. For you, the continuum proxy effectively acts as your GenAI endpoint.

A simple example for a default and a stream-configured prompt and its respective response is given below. We assume your proxy is running at localhost:8080. Of course you can run it wherever you want (see proxy configuration).

Example prompting

For prompting Continuum we can use:

POST /v1/chat/completions

This proxy endpoint generates a response to a chat prompt.

Request body

model string: The offered LLMs as listed here.
messages list: The prompts for which a response is generated.
Additional parameters: These mirror the OpenAI API and are supported based on the model server's capabilities. However, options requiring internet access, such as image_url, aren't supported due to the sandboxed environment.

Returns

The response returns a chat completion or chat completion chunk object containing:

choices string: The response generated by the model.
Other parameters: Other fields are consistent with the OpenAI API specifications.

Default
Streaming

Example request

curl localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4",
    "messages": [
      {
        "role": "user",
        "content": "Tell me a joke!"
      }
    ]
  }'

Example response

{
  "id": "chat-6e8dc369b0614e2488df6a336c24c349",
  "object": "chat.completion",
  "created": 1727968175,
  "model": "hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "What do you call a fake noodle?\n\nAn impasta.",
        "tool_calls": []
      },
      "logprobs": null,
      "finish_reason": "stop",
      "stop_reason": null
    }
  ],
  "usage": {
    "prompt_tokens": 40,
    "total_tokens": 54,
    "completion_tokens": 14
  },
  "prompt_logprobs": null
}

Example request

curl localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4",
    "messages": [
      {
        "role": "user",
        "content": "Hi there!"
      }
    ],
    "stream" : true
  }'

Example response

{"id":"chat-4f0bb41857044f52b5fa03fd3c752c8e","object":"chat.completion.chunk","created":1727968591,"model":"hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4","choices":[{"index":0,"delta":{"role":"assistant","content":""},"logprobs":null,"finish_reason":null}]}

{"id":"chat-4f0bb41857044f52b5fa03fd3c752c8e","object":"chat.completion.chunk","created":1727968591,"model":"hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4","choices":[{"index":0,"delta":{"content":"It"},"logprobs":null,"finish_reason":null}]}

{"id":"chat-4f0bb41857044f52b5fa03fd3c752c8e","object":"chat.completion.chunk","created":1727968591,"model":"hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4","choices":[{"index":0,"delta":{"content":"'s"},"logprobs":null,"finish_reason":null}]}

    ...

{"id":"chat-4f0bb41857044f52b5fa03fd3c752c8e","object":"chat.completion.chunk","created":1727968591,"model":"hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4","choices":[{"index":0,"delta":{"content":""},"logprobs":null,"finish_reason":"stop","stop_reason":null}]}

Available models

We currently support the hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4 model. Model support will be extended in the near future.

List models

GET /v1/models

This endpoint lists all currently available models.

Returns

The response is a list of model objects.

For detailed information, refer to the OpenAI API documentation.

Example request

curl localhost:8080/v1/models

Example response

{
  "id": "hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4",
  "object": "model",
  "created": 1727968847,
  "owned_by": "vllm",
  "root": "hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4",
  "parent": null,
  "max_model_len": 131072,
  "permission": [
    {
      "id": "modelperm-763c1f8144b745efa4e7dd984faf9517",
      "object": "model_permission",
      "created": 1727968847,
      "allow_create_engine": false,
      "allow_sampling": true,
      "allow_logprobs": true,
      "allow_search_indices": false,
      "allow_view": true,
      "allow_fine_tuning": false,
      "organization": "*",
      "group": null,
      "is_blocking": false
    }
  ]
}

System prompts

The offered model supports setting a system prompt as part of the request's messages field (see example below). This can be used to tailor the model's behavior to your specific needs.

Improving language accuracy

The model may occasionally make minor language mistakes, especially in languages other than English. To optimize language accuracy, you can set a system prompt. The following example significantly improves accuracy for the German language:

{
  "role": "system",
  "content": "Ensure every response is free from grammar and spelling errors. Use only valid words. Apply correct article usage, especially for languages with gender-specific articles like German. Follow standard grammar and syntax rules, and check spelling against standard dictionaries. Maintain consistency in style and terminology throughout."
}

Example prompting​

Request body​

Returns​

Available models​

List models​

Returns​

System prompts​

Improving language accuracy​

Example prompting

Request body

Returns

Available models

List models

Returns

System prompts

Improving language accuracy