Skip to main content
Version: 0.1

Sandboxing

The inference code is provided by an external party, such as HuggingFace TGI, vLLM, NVIDIA Triton, and is frequently updated. Including it in remote attestation and reviewing it regularly would be impractical.

This code operates within a confidential computing environment that encrypts all data in memory. Within this secure environment, the inference code can access user data. To prevent the inference code from leaking user data, it runs in a sandbox inside the confidential computing environment. The sandbox is isolated from the host through gVisor.

This architecture ensures that (1) the infrastructure can't access user data or the inference code, and (2) the inference code can't leak user data to unprotected memory, the disk, or the network.

A key principle is that the inference code can only communicate with the GPU and the encryption proxy, ensuring all communication is encrypted and preventing plaintext data leaks.

Future enhancements

Currently, potential data leaks between user sessions remain a concern. In an enterprise scenario where all users trust each other, this isn't an issue.

To address situations where users may not trust each other (multitenancy), future versions of Continuum might implement transparency logs for the inference code to detect and attribute any backdoors after the fact.

Alternatively, future versions could make the inference code stateless and free of side effects to prevent data flow between sessions.