Prompting
For prompting Continuum you can follow the OpenAI Chat API specification. We like to highlight that we don't use any OpenAI services but only adhere to their interface definitions. For sending prompts, simply use the continuum proxy as your endpoint. It'll take care of end-to-end encryption with our GenAI services for you.
Don’t send prompts directly to https://api.ai.confidential.cloud/v1/chat/completions! It won’t work anyway. Always send your prompts to your proxy, which handles encryption and communicates with the actual GenAI endpoint. For you, the continuum proxy effectively acts as your GenAI endpoint.
A simple example for a default and a stream-configured prompt and its respective response is given below. We assume your proxy is running at localhost:8080. Of course you can run it wherever you want (see proxy configuration).
Example prompting
For prompting Continuum we can use:
POST /v1/chat/completions
This proxy endpoint generates a response to a chat prompt.
Request body
model
string: The offered LLMs as listed here.messages
list: The prompts for which a response is generated.- Additional parameters: These mirror the OpenAI API and are supported based on the model server's capabilities. However, options requiring internet access, such as
image_url
, aren't supported due to the sandboxed environment.
Returns
The response returns a chat completion or chat completion chunk object containing:
choices
string: The response generated by the model.- Other parameters: Other fields are consistent with the OpenAI API specifications.
- Default
- Streaming
Example request
curl localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4",
"messages": [
{
"role": "user",
"content": "Tell me a joke!"
}
]
}'
Example response
{
"id": "chat-6e8dc369b0614e2488df6a336c24c349",
"object": "chat.completion",
"created": 1727968175,
"model": "hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "What do you call a fake noodle?\n\nAn impasta.",
"tool_calls": []
},
"logprobs": null,
"finish_reason": "stop",
"stop_reason": null
}
],
"usage": {
"prompt_tokens": 40,
"total_tokens": 54,
"completion_tokens": 14
},
"prompt_logprobs": null
}
Example request
curl localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4",
"messages": [
{
"role": "user",
"content": "Hi there!"
}
],
"stream" : true
}'
Example response
{"id":"chat-4f0bb41857044f52b5fa03fd3c752c8e","object":"chat.completion.chunk","created":1727968591,"model":"hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4","choices":[{"index":0,"delta":{"role":"assistant","content":""},"logprobs":null,"finish_reason":null}]}
{"id":"chat-4f0bb41857044f52b5fa03fd3c752c8e","object":"chat.completion.chunk","created":1727968591,"model":"hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4","choices":[{"index":0,"delta":{"content":"It"},"logprobs":null,"finish_reason":null}]}
{"id":"chat-4f0bb41857044f52b5fa03fd3c752c8e","object":"chat.completion.chunk","created":1727968591,"model":"hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4","choices":[{"index":0,"delta":{"content":"'s"},"logprobs":null,"finish_reason":null}]}
...
{"id":"chat-4f0bb41857044f52b5fa03fd3c752c8e","object":"chat.completion.chunk","created":1727968591,"model":"hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4","choices":[{"index":0,"delta":{"content":""},"logprobs":null,"finish_reason":"stop","stop_reason":null}]}
Available models
We currently support the hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4 model. Model support will be extended in the near future.
List models
GET /v1/models
This endpoint lists all currently available models.
Returns
The response is a list of model objects.
For detailed information, refer to the OpenAI API documentation.
Example request
curl localhost:8080/v1/models
Example response
{
"id": "hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4",
"object": "model",
"created": 1727968847,
"owned_by": "vllm",
"root": "hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4",
"parent": null,
"max_model_len": 131072,
"permission": [
{
"id": "modelperm-763c1f8144b745efa4e7dd984faf9517",
"object": "model_permission",
"created": 1727968847,
"allow_create_engine": false,
"allow_sampling": true,
"allow_logprobs": true,
"allow_search_indices": false,
"allow_view": true,
"allow_fine_tuning": false,
"organization": "*",
"group": null,
"is_blocking": false
}
]
}
System prompts
The offered model supports setting a system prompt as part of the request's messages
field (see example below). This can be used to tailor the model's behavior to your specific needs.
Improving language accuracy
The model may occasionally make minor language mistakes, especially in languages other than English. To optimize language accuracy, you can set a system prompt. The following example significantly improves accuracy for the German language:
{
"role": "system",
"content": "Ensure every response is free from grammar and spelling errors. Use only valid words. Apply correct article usage, especially for languages with gender-specific articles like German. Follow standard grammar and syntax rules, and check spelling against standard dictionaries. Maintain consistency in style and terminology throughout."
}