Architecture
Continuum consists of two parts: the server side and the client side. The server side hosts the AI service and processes prompts securely. The client side verifies the server, encrypts the prompts, and sends inference requests. This page explains how these components interact and details their respective roles.
Server side
The server side of Continuum hosts the inference service. Its architecture includes two main components: the workers and the attestation service.
Worker
The worker node is central to the backend. It hosts an AI model and serves inference requests. The necessary inference code and model are provided externally by the inference and model owner. The containerized inference code, referred to as AI code, runs in a secure environment.
Each worker is a confidential VM (CVM) running Continuum's bespoke Linux OS, Continuum OS. This OS is minimal, immutable, and verifiable through remote attestation. Continuum OS hosts workloads in a sandbox and mediates network traffic through an encryption proxy.
Worker API
The worker provides an HTTPS API to manage (start and stop) AI code containers. For more information on AI code configuration, refer to the Manifest page.
AI code sandbox
The AI code, provided by the inference owner, runs in a gVisor sandbox. This sandbox isolates the AI code from the host, handling system calls in a user-space kernel and blocking network traffic to prevent data leaks.
Encryption proxy
Each AI code has an attached proxy container, which is its only connection to the outside world. The proxy manages prompt encryption with the client side. It decrypts incoming requests and forwards them to the sandbox. In the opposite direction, encrypts responses and sends them back to the user. The proxy supports various API adapters, such as OpenAI or Triton Generate.
Attestation service
The attestation feature of CVMs ensures the integrity and authenticity of Continuum workers. This allows both the service provider and clients to verify the workers' integrity and that they're interacting with a benign Continuum deployment.
Because workers can be dynamically scaled and handle concurrent requests, individual verification is impractical. Instead, the attestation service (AS) handles attestation centrally. On the server side, the AS verifies each worker based on its attestation statement. On the client side, the AS provides a system-wide attestation endpoint and handles key exchanges for prompt encryption.
The AS runs in a Confidential Virtual Machine (CVM). During initialization, the service provider uses the Continuum CLI to establish trust by verifying the AS's attestation report. The required reference values for verification are included in the AS attestation policy.
Initialization steps
- Verify the AS's attestation report against the AS policy defined in the manifest.
- Configure the AS with the manifest, defining reference values for worker attestation.
After initialization, workers register with the AS, providing their attestation statements. Only verified workers can serve inference requests. The AS also provides an API for clients to verify the AS and upload encryption secrets. Verified workers synchronize with the AS to retrieve these secrets.
Client side
Clients are categorized into:
- Operators who set up the server side and configure the inference service and model.
- Inference clients who interact with the model.
Operators
Operators set up a Continuum deployment and use the Continuum CLI to verify the AS and configure it with a manifest. They also use the CLI to configure workers via the worker API with the AI code and model.
Inference clients
Inference clients first connect to the service through the continuum-proxy. The proxy exchanges prompt encryption secrets via the AS. The continuum-proxy is managed by the user because it establishes trust in the Continuum deployment. Requests by the inference client are then encrypted by the continuum-proxy and forwarded to the Continuum AI worker.
Workflow
The diagram below illustrates the interactions between the different components of Continuum and the user.
Initially, admins verify the AS's integrity through the CLI. Upon successful verification, the admin sets the manifest via the CLI. Interacting directly with the workers, they configure the AI code via the worker API.
Workers register with the AS, which verifies their attestation reports. Verified workers receive inference secrets and can then serve inference requests. End users interact with the continuum-proxy, which takes care of the interaction with the AS and the worker. The continuum-proxy verifies the deployment via the AS and sets inferences secrets. Users can then send their prompt requests as usual and let the proxy handle prompt encryption and secret management. The encrypted requests are received by the encryption proxy, which decrypts the prompts, forwards them to the sandbox, re-encrypts the responses, and sends them back to the user.