Skip to main content
Version: Next

Coordinator High-Availability

This guide shows how to scale the Contrast Coordinator to more than one instance and make it highly available.

Workflow

Deploy the Coordinator according to the basic workflow. By default, there is only one Coordinator instance. Verify that this Coordinator is in state Ready:

kubectl get pods -l app.kubernetes.io/name=coordinator
NAME            READY   STATUS    RESTARTS   AGE
coordinator-0 1/1 Running 0 11m

Next, we increase the number of instances to 3. This will create additional Coordinator instances, one after another, as described in the Kubernetes documentation for StatefulSet. After some time, the additional Coordinators should enter the Ready state, too.

kubectl scale statefulset/coordinator --replicas 3
kubectl wait statefulset/coordinator --for=jsonpath='{.status.readyReplicas}=3' --timeout=2m
kubectl get pods -l app.kubernetes.io/name=coordinator
statefulset.apps/coordinator scaled
statefulset.apps/coordinator condition met
NAME READY STATUS RESTARTS AGE
coordinator-0 1/1 Running 0 14m
coordinator-1 1/1 Running 0 107s
coordinator-2 1/1 Running 0 73s

You can now try deleting individual Coordinator pods, draining a node in a multi-node cluster, or scheduling a full rollout of the Coordinator. Availability of the Coordinator endpoint shouldn't be affected, and all Coordinator pods should automatically recover.

kubectl rollout restart statefulset/coordinator
kubectl rollout status statefulset/coordinator -w
kubectl get pods -l app.kubernetes.io/name=coordinator

statefulset.apps/coordinator restarted
Waiting for 1 pods to be ready...
Waiting for partitioned roll out to finish: 1 out of 3 new pods have been updated...
Waiting for 1 pods to be ready...
Waiting for 1 pods to be ready...
Waiting for 1 pods to be ready...
Waiting for partitioned roll out to finish: 2 out of 3 new pods have been updated...
Waiting for 1 pods to be ready...
Waiting for 1 pods to be ready...
Waiting for 1 pods to be ready...
partitioned roll out complete: 3 new pods have been updated...
NAME READY STATUS RESTARTS AGE
coordinator-0 1/1 Running 0 41s
coordinator-1 1/1 Running 0 71s
coordinator-2 1/1 Running 0 99s

How it works

Newly started (or restarted) Coordinator instances try to recover from other Coordinator instances in the cluster. As long as a single Coordinator is initialized, the other instances eventually recover from it. StatefulSet semantics guarantee that Coordinator pods are started predictably, and only after all existing Coordinators are recovered.