Version: 1.9

Prepare a bare-metal instance

Hardware and firmware setup

AMD SEV-SNP
Intel TDX

Update your BIOS to a version that supports AMD SEV-SNP. Updating to the latest available version is recommended as newer versions will likely contain security patches for AMD SEV-SNP.
Enter BIOS setup to enable SMEE, IOMMU, RMP coverage, and SEV-SNP. Set the SEV-ES ASID Space Limit to a non-zero number (higher is better).
Download the latest firmware version for your processor from AMD, unpack it, and place it in /lib/firmware/amd.

Consult AMD's Using SEV with AMD EPYC Processors user guide for more information.

Follow Canonical's instructions in 4.2 Enable Intel TDX in Host OS (set TDX_SETUP_ATTESTATION=1 in setup-tdx-config), 4.3 Enable Intel TDX in the Host's BIOS and 9.2 Setup Intel® SGX Data Center Attestation Primitives (Intel® SGX DCAP) on the Host OS (skipping step 9.2.1). You can ignore the other sections of the document.

Kernel setup

AMD SEV-SNP
Intel TDX

Install Linux kernel 6.11 or greater.

Containerd uses a significant amount of inotify instances, so we recommend to allow at least 8192. If necessary, the default can be increased by creating a config override file (for example in /etc/sysctl.d/98-containerd.conf) with the following content:

fs.inotify.max_user_instances = 8192

Apply this change by running systemctl restart systemd-sysctl and verify it using sysctl fs.inotify.max_user_instances.

K3s setup

Follow the K3s setup instructions to create a cluster. Contrast is currently tested with K3s version v1.31.5+k3s1.
Install a block storage provider such as Longhorn and mark it as the default storage class.
Ensure that a load balancer controller is installed. For development and testing purposes, the built-in ServiceLB should suffice.

Preparing a cluster for GPU usage

AMD SEV-SNP
Intel TDX

To enable GPU usage on a Contrast cluster, some conditions need to be fulfilled for each cluster node that should host GPU workloads:

Ensure that GPUs supporting confidential computing (CC) are available on the machine.

lspci -nnk | grep '3D controller' -A3

This should show a CC-capable GPU like the NVIDIA H100:

41:00.0 3D controller [0302]: NVIDIA Corporation GH100 [H100 PCIe] [10de:2331] (rev a1)
   Subsystem: NVIDIA Corporation GH100 [H100 PCIe] [10de:1626]
   Kernel driver in use: vfio-pci
   Kernel modules: nvidiafb, nouveau

info

Contrast doesn't support non-CC GPUs.

You must activate the IOMMU. You can check by running:
```
ls /sys/kernel/iommu_groups
```
If the output contains the group indices (0, 1, ...), the IOMMU is supported on the host. Otherwise, add intel_iommu=on to the kernel command line.
Additionally, the host kernel needs to have the following kernel configuration options enabled:
- CONFIG_VFIO
- CONFIG_VFIO_IOMMU_TYPE1
- CONFIG_VFIO_MDEV
- CONFIG_VFIO_MDEV_DEVICE
- CONFIG_VFIO_PCI
A CDI configuration needs to be present on the node. To generate it, you can use the NVIDIA Container Toolkit. Refer to the official instructions on how to generate a CDI configuration with it.

If the per-node requirements are fulfilled, deploy the NVIDIA GPU Operator to the cluster. It provisions pod-VMs with GPUs via VFIO.

Initially, label all nodes that should run GPU workloads:

kubectl label node <node-name> nvidia.com/gpu.workload.config=vm-passthrough

For a GPU-enabled Contrast cluster, you can then deploy the operator with the following commands:

# Add the NVIDIA Helm repository
helm repo add nvidia https://helm.ngc.nvidia.com/nvidia && helm repo update

# Install the GPU Operator
helm install --wait --generate-name \
   -n gpu-operator --create-namespace \
   nvidia/gpu-operator \
   --version=v25.3.0 \
   --set sandboxWorkloads.enabled=true \
   --set sandboxWorkloads.defaultWorkload='vm-passthrough' \
   --set nfd.nodefeaturerules=true \
   --set vfioManager.enabled=true \
   --set ccManager.enabled=true \
   --set ccManager.defaultMode=on

Refer to the official installation instructions for details and further options.

Once the operator is deployed, check the available GPUs in the cluster:

kubectl get nodes -l nvidia.com/gpu.present -o json | \
  jq '.items[0].status.allocatable |
    with_entries(select(.key | startswith("nvidia.com/"))) |
    with_entries(select(.value != "0"))'

The above command should yield an output similar to the following, depending on what GPUs are available:

{
   "nvidia.com/GH100_H100_PCIE": "1"
}

These identifiers are then used to run GPU workloads on the cluster.

Hardware and firmware setup​

Kernel setup​

K3s setup​

Preparing a cluster for GPU usage​

Hardware and firmware setup

Kernel setup

K3s setup

Preparing a cluster for GPU usage