Example

This example shows how to deploy the Mistral-7B LLM for inference. It uses vLLM as an inference server.

Continuum
Default setup

# VM setup
terraform apply

# Initialization
continuum init -m manifest.toml -e <attestation_service_ip:port>

continuum secret set -s secrets.toml -e <attestation_service_ip:port> -k key.pem

# Deployment. Spawns a protected workload.
curl -X POST -H "Content-Type: application/json" -d '{"workload_port": 8000, "exposed_port": 8008, "gpu_count": 1, "pull_options": {"image_url": "ghcr.io/mistralai/mistral-src/vllm:latest"}}' http://<worker_ip>:8080/run

# Execution with encryption.
curl <worker_ip>:8008/v1/chat/completions -H "Content-Type: application/json" -d '{"model": "mistralai/Mistral-7B-Instruct-v0.2","messages":"demo-app:3a71ea7448791716e325146b:a03acc195834a9822d676e797381c035418dde3539cf46ae61d0ef2ff59b81f1d7d05cc5d8b79cf2ec08c9ce147c90c"}

For more information on the specs of the manifest and secret files, refer to the Configuration page.

# VM setup
terraform apply

# Initialization
# This step is missing in a default setup.

# Deployment
docker run --gpus all -p 8008:8000 ghcr.io/mistralai/mistral-src/vllm:latest --model mistralai/Mistral-7B-Instruct-v0.2

# Execution without encryption.
curl <worker_ip>:8008/v1/chat/completions -H "Content-Type: application/json" -d '{"model": "mistralai/Mistral-7B-Instruct-v0.2","messages": [{"role": "user", "content": "Describe confidential computing in one sentence."}]}