Prepare a bare-metal instance
Hardware and firmware setup
- AMD SEV-SNP
- Intel TDX
- Update your BIOS to a version that supports AMD SEV-SNP. Updating to the latest available version is recommended as newer versions will likely contain security patches for AMD SEV-SNP.
- Enter BIOS setup to enable SMEE, IOMMU, RMP coverage, and SEV-SNP. Set the SEV-ES ASID Space Limit to a non-zero number (higher is better).
- Download the latest firmware version for your processor from AMD, unpack it, and place it in
/lib/firmware/amd
.
Consult AMD's Using SEV with AMD EPYC Processors user guide for more information.
Follow Canonical's instructions on setting up Intel TDX in the host's BIOS.
Kernel setup
- AMD SEV-SNP
- Intel TDX
Install a kernel with version 6.11 or greater. If you're following this guide before 6.11 has been released, use 6.11-rc3. Don't use 6.11-rc4 - 6.11-rc6 as they contain a regression. 6.11-rc7+ might work.
Follow Canonical's instructions on setting up Intel TDX on Ubuntu 24.04. Note that Contrast currently only supports Intel TDX with Ubuntu 24.04.
Increase the user.max_inotify_instances
sysctl limit by adding user.max_inotify_instances=8192
to /etc/sysctl.d/99-sysctl.conf
and running sysctl --system
.
K3s setup
- Follow the K3s setup instructions to create a cluster.
- Install a block storage provider such as Longhorn and mark it as the default storage class.
Preparing a cluster for GPU usage
- AMD SEV-SNP
- Intel TDX
To enable GPU usage on a Contrast cluster, some conditions need to be fulfilled for each cluster node that should host GPU workloads:
-
Ensure that GPUs supporting confidential computing (CC) are available on the machine.
lspci -nnk | grep '3D controller' -A3
This should show a CC-capable GPU like the NVIDIA H100:
41:00.0 3D controller [0302]: NVIDIA Corporation GH100 [H100 PCIe] [10de:2331] (rev a1)
Subsystem: NVIDIA Corporation GH100 [H100 PCIe] [10de:1626]
Kernel driver in use: vfio-pci
Kernel modules: nvidiafb, nouveauinfoContrast doesn't support non-CC GPUs.
-
You must activate the IOMMU. You can check by running:
ls /sys/kernel/iommu_groups
If the output contains the group indices (
0
,1
, ...), the IOMMU is supported on the host. Otherwise, addintel_iommu=on
to the kernel command line. -
Additionally, the host kernel needs to have the following kernel configuration options enabled:
CONFIG_VFIO
CONFIG_VFIO_IOMMU_TYPE1
CONFIG_VFIO_MDEV
CONFIG_VFIO_MDEV_DEVICE
CONFIG_VFIO_PCI
-
A CDI configuration needs to be present on the node. To generate it, you can use the NVIDIA Container Toolkit. Refer to the official instructions on how to generate a CDI configuration with it.
If the per-node requirements are fulfilled, deploy the NVIDIA GPU Operator to the cluster. It provisions pod-VMs with GPUs via VFIO.
Initially, label all nodes that should run GPU workloads:
kubectl label node <node-name> nvidia.com/gpu.workload.config=vm-passthrough
For a GPU-enabled Contrast cluster, you can then deploy the operator with the following command:
helm install --wait --generate-name \
-n gpu-operator --create-namespace \
nvidia/gpu-operator \
--version=v24.9.1 \
--set sandboxWorkloads.enabled=true \
--set sandboxWorkloads.defaultWorkload='vm-passthrough' \
--set nfd.nodefeaturerules=true \
--set vfioManager.enabled=true \
--set ccManager.enabled=true
Refer to the official installation instructions for details and further options.
Once the operator is deployed, check the available GPUs in the cluster:
kubectl get nodes -l nvidia.com/gpu.present -o json | \
jq '.items[0].status.allocatable |
with_entries(select(.key | startswith("nvidia.com/"))) |
with_entries(select(.value != "0"))'
The above command should yield an output similar to the following, depending on what GPUs are available:
{
"nvidia.com/GH100_H100_PCIE": "1"
}
These identifiers are then used to run GPU workloads on the cluster.
Currently, Contrast only supports GPU workloads on SEV-SNP-based clusters.