Version: Next

Sandboxing

The inference code is provided by an external party, such as HuggingFace TGI, vLLM, NVIDIA Triton, and is frequently updated. Including it in remote attestation and reviewing it regularly would be impractical.

This code operates within a confidential computing environment that encrypts all data in memory. Within this secure environment, the inference code can access user data. To prevent the inference code from leaking user data, it runs in a sandbox inside the confidential computing environment. The sandbox is isolated from the host through gVisor.

This architecture ensures that (1) the infrastructure can't access user data or the inference code, and (2) the inference code can't leak user data to unprotected memory, the disk, or the network.

A key principle is that the inference code can only communicate with the GPU and the encryption proxy, ensuring all communication is encrypted and preventing plaintext data leaks.