Skip to main content

Usage

This example shows how to deploy the Mistral-7B LLM for inference. It uses vLLM as an inference server.

# VM setup
terraform apply

# Initialization
continuum init -a as-manifest.json -m worker-manifest.json -s worker-secrets.json

# Deployment. Spawns a protected workload.
curl -X POST -H "Content-Type: application/json" -d '{"workload_port": 8000, "exposed_port": 8008, "gpu_count": 1, "pull_options": {"image_url": "ghcr.io/mistralai/mistral-src/vllm:latest"}}' http://<worker_ip>:8080/run

# Execution with encryption.
curl <worker_ip>:8008/v1/chat/completions -H "Content-Type: application/json" -d '{"model": "mistralai/Mistral-7B-Instruct-v0.2","messages":"demo-app:3a71ea7448791716e325146b:a03acc195834a9822d676e797381c035418dde3539cf46ae61d0ef2ff59b81f1d7d05cc5d8b79cf2ec08c9ce147c90c"}