Prepare a bare-metal instance
Hardware and firmware setup
- AMD SEV-SNP
- Intel TDX
- Update your BIOS to a version that supports AMD SEV-SNP. Updating to the latest available version is recommended as newer versions will likely contain security patches for AMD SEV-SNP.
- Enter BIOS setup to enable SMEE, IOMMU, RMP coverage, and SEV-SNP. Set the SEV-ES ASID Space Limit to a non-zero number (higher is better).
- Download the latest firmware version for your processor from AMD, unpack it, and place it in
/lib/firmware/amd
.
Consult AMD's Using SEV with AMD EPYC Processors user guide for more information.
Follow Canonical's instructions on setting up Intel TDX in the host's BIOS.
Kernel setup
- AMD SEV-SNP
- Intel TDX
Install Linux kernel 6.11 or greater.
Follow Canonical's instructions on setting up Intel TDX on Ubuntu 24.04. Note that Contrast currently only supports Intel TDX with Ubuntu 24.04.
Increase the user.max_inotify_instances
sysctl limit by adding user.max_inotify_instances=8192
to /etc/sysctl.d/99-sysctl.conf
and running sysctl --system
.
K3s setup
- Follow the K3s setup instructions to create a cluster.
- Install a block storage provider such as Longhorn and mark it as the default storage class.
Preparing a cluster for GPU usage
- AMD SEV-SNP
- Intel TDX
To enable GPU usage on a Contrast cluster, some conditions need to be fulfilled for each cluster node that should host GPU workloads:
-
Ensure that GPUs supporting confidential computing (CC) are available on the machine.
lspci -nnk | grep '3D controller' -A3
This should show a CC-capable GPU like the NVIDIA H100:
41:00.0 3D controller [0302]: NVIDIA Corporation GH100 [H100 PCIe] [10de:2331] (rev a1)
Subsystem: NVIDIA Corporation GH100 [H100 PCIe] [10de:1626]
Kernel driver in use: vfio-pci
Kernel modules: nvidiafb, nouveauinfoContrast doesn't support non-CC GPUs.
-
You must activate the IOMMU. You can check by running:
ls /sys/kernel/iommu_groups
If the output contains the group indices (
0
,1
, ...), the IOMMU is supported on the host. Otherwise, addintel_iommu=on
to the kernel command line. -
Additionally, the host kernel needs to have the following kernel configuration options enabled:
CONFIG_VFIO
CONFIG_VFIO_IOMMU_TYPE1
CONFIG_VFIO_MDEV
CONFIG_VFIO_MDEV_DEVICE
CONFIG_VFIO_PCI
-
A CDI configuration needs to be present on the node. To generate it, you can use the NVIDIA Container Toolkit. Refer to the official instructions on how to generate a CDI configuration with it.
If the per-node requirements are fulfilled, deploy the NVIDIA GPU Operator to the cluster. It provisions pod-VMs with GPUs via VFIO.
Initially, label all nodes that should run GPU workloads:
kubectl label node <node-name> nvidia.com/gpu.workload.config=vm-passthrough
For a GPU-enabled Contrast cluster, you can then deploy the operator with the following command:
helm install --wait --generate-name \
-n gpu-operator --create-namespace \
nvidia/gpu-operator \
--version=v24.9.1 \
--set sandboxWorkloads.enabled=true \
--set sandboxWorkloads.defaultWorkload='vm-passthrough' \
--set nfd.nodefeaturerules=true \
--set vfioManager.enabled=true \
--set ccManager.enabled=true
Refer to the official installation instructions for details and further options.
Once the operator is deployed, check the available GPUs in the cluster:
kubectl get nodes -l nvidia.com/gpu.present -o json | \
jq '.items[0].status.allocatable |
with_entries(select(.key | startswith("nvidia.com/"))) |
with_entries(select(.value != "0"))'
The above command should yield an output similar to the following, depending on what GPUs are available:
{
"nvidia.com/GH100_H100_PCIE": "1"
}
These identifiers are then used to run GPU workloads on the cluster.
Currently, Contrast only supports GPU workloads on SEV-SNP-based clusters.