Skip to main content

Introduction

Continuum is a framework for securely deploying LLMs and other AI models. It enables the creation of ChatGPT-style services in which both user prompts and model weights are shielded throughout. With Continuum, no involved entity can ever access user prompts in plaintext (read more on the entities in threat model) Continuum let's you run any publicly available LLM and integrates with well-known inference servers like NVIDIA Triton, vLLM or Hugging Face TGI.

How does it work?

For end users, Continuum ensures end-to-end encryption of prompts. In Continuum, the AI model (inference server) runs inside a confidential computing environment that keeps all data encrypted in memory. Within that encrypted environment, the inference code gets access to the user's prompt. To prevent the inference code from leaking user data, it runs within a sandbox within the confidential computing environment.

This architecture prevents (1) the infrastructure from accessing the user data and the inference code and (2) the inference code from leaking user data to unintended third parties via, e.g., unprotected memory, the disk, or the network.

Continuum establishes an encrypted channel between the client and sandboxed inference codes. This means prompts are encrypted on the client side, decrypted within the confidential computing environment, and re-encrypted before returning to the client. The encryption ensures that the prompt is inaccessible to the service provider. To maintain processing efficiency, only the prompt text (and not the entire request body) is encrypted using Authenticated Encryption, allowing other request details, like token length, to be accessible for service provider processing.

Who is it for?

  • For AI users: Continuum-based AI services ensure that your data and intellectual property always stay private. It makes sure that your data can't be leaked, used for retraining, or cause compliance issues.
  • For service providers: By deploying LLMs with Continuum, user requests and responses are kept inaccessible to the inference service provider and infrastructure. With Continuum, you can exclude yourself from your customer's data.
  • For model owners: Protect your weights while deploying to untrusted environments. Provide model weights as encrypted files. Only sandboxed inference code gains access to decrypted weights.