StatefulSet Canary Deployments with a Custom Kubernetes Operator in Go
The Inherent Challenge: Why StatefulSet Canaries Defy Standard Tooling
Stateless Deployments in Kubernetes benefit from a rich ecosystem of canary deployment strategies facilitated by service meshes and ingress controllers. These tools manipulate traffic routing to direct a percentage of users to a new version. This model breaks down for StatefulSets. The core value of a StatefulSet—stable network identifiers (e.g., my-app-0, my-app-1) and persistent storage (PersistentVolumeClaim per Pod)—is also its greatest challenge for canary rollouts. You cannot simply route traffic away from my-app-0 to a canary version without breaking intra-cluster communication and risking state corruption.
The default OnDelete or RollingUpdate strategies for StatefulSets are insufficient. A RollingUpdate updates all pods sequentially, offering no control for a partial, canary-style rollout. An operator-driven approach is required to orchestrate the lifecycle of stateful pods with surgical precision.
This article details the implementation of a custom Go-based Kubernetes Operator that manages a CanaryStatefulSet custom resource. The core pattern involves orchestrating two underlying StatefulSet resources: one for the stable version and one for the canary. This allows us to introduce a new version to a subset of pods while maintaining the integrity of the stable cluster.
1. Designing the `CanaryStatefulSet` Custom Resource Definition (CRD)
The foundation of any operator is its API. Our CanaryStatefulSet CRD must capture the desired state for a complete canary lifecycle. It will abstract away the complexity of managing two separate StatefulSets.
Key fields in the spec:
* template: A StatefulSetSpec that defines the base configuration for our application. This is the source of truth for pod templates, volume claims, etc.
* replicas: The total number of desired replicas for the application when fully scaled.
* canaryReplicas: The number of replicas to be deployed as part of the canary version.
* image: The container image for the new (canary) version. The stable version's image will be inferred from the last successful spec.
* promotion: A strategy block, e.g., { type: "auto" } or { type: "manual", annotation: "canary.example.com/promote" }.
Key fields in the status:
* stableRevision: The controller-revision-hash of the stable StatefulSet.
* canaryRevision: The controller-revision-hash of the canary StatefulSet.
* stableReplicas: Current replica count of the stable set.
* canaryReplicas: Current replica count of the canary set.
* phase: The current state of the rollout (e.g., Initializing, ReconcilingStable, DeployingCanary, CanaryActive, Promoting, Stable).
Here is the complete CRD definition:
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: canarystatefulsets.app.example.com
spec:
group: app.example.com
names:
kind: CanaryStatefulSet
listKind: CanaryStatefulSetList
plural: canarystatefulsets
singular: canarystatefulset
scope: Namespaced
versions:
- name: v1alpha1
served: true
storage: true
schema:
openAPIV3Schema:
type: object
properties:
spec:
type: object
properties:
replicas:
type: integer
minimum: 0
canaryReplicas:
type: integer
minimum: 0
image:
type: string
template:
type: object
# This uses a preserverd unknown fields marker to allow any valid StatefulSetSpec
# In a real implementation, you'd generate a more specific schema
x-kubernetes-preserve-unknown-fields: true
required:
- replicas
- canaryReplicas
- image
- template
status:
type: object
properties:
phase:
type: string
stableRevision:
type: string
canaryRevision:
type: string
stableReplicas:
type: integer
canaryReplicas:
type: integer
readyReplicas:
type: integer
2. The Reconciliation Loop: Core Implementation in Go
We'll use the Kubebuilder framework to scaffold our operator. The heart of the operator is the Reconcile function in the controller. It's invoked whenever the CanaryStatefulSet resource or any of its owned resources change.
Our reconciliation logic follows a state machine based on the status.phase.
2.1. Initial Setup and Fetching Resources
The Reconcile function starts by fetching the CanaryStatefulSet instance. We also need to fetch the two underlying StatefulSets—we'll name them and .
package controllers
import (
"context"
"fmt"
appsv1 "k8s.io/api/apps/v1"
corev1 "k8s.io/api/core/v1"
"k8s.io/apimachinery/pkg/api/errors"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/runtime"
"k8s.io/apimachinery/pkg/types"
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/client"
"sigs.k8s.io/controller-runtime/pkg/log"
appv1alpha1 "github.com/your-repo/canary-operator/api/v1alpha1"
)
// CanaryStatefulSetReconciler reconciles a CanaryStatefulSet object
type CanaryStatefulSetReconciler struct {
client.Client
Scheme *runtime.Scheme
}
func (r *CanaryStatefulSetReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
logger := log.FromContext(ctx)
// 1. Fetch the CanaryStatefulSet instance
cr := &appv1alpha1.CanaryStatefulSet{}
if err := r.Get(ctx, req.NamespacedName, cr); err != nil {
if errors.IsNotFound(err) {
logger.Info("CanaryStatefulSet resource not found. Ignoring since object must be deleted")
return ctrl.Result{}, nil
}
logger.Error(err, "Failed to get CanaryStatefulSet")
return ctrl.Result{}, err
}
// 2. Fetch the stable and canary StatefulSets
stableSts := &appsv1.StatefulSet{}
stableStsName := cr.Name + "-stable"
stableExists := true
if err := r.Get(ctx, types.NamespacedName{Name: stableStsName, Namespace: cr.Namespace}, stableSts); err != nil {
if !errors.IsNotFound(err) {
logger.Error(err, "Failed to get stable StatefulSet")
return ctrl.Result{}, err
}
stableExists = false
}
canarySts := &appsv1.StatefulSet{}
canaryStsName := cr.Name + "-canary"
canaryExists := true
if err := r.Get(ctx, types.NamespacedName{Name: canaryStsName, Namespace: cr.Namespace}, canarySts); err != nil {
if !errors.IsNotFound(err) {
logger.Error(err, "Failed to get canary StatefulSet")
return ctrl.Result{}, err
}
canaryExists = false
}
// ... Reconciliation logic continues here ...
return ctrl.Result{}, nil
}
// Helper to construct a new StatefulSet
func (r *CanaryStatefulSetReconciler) newStsForCR(cr *appv1alpha1.CanaryStatefulSet, name, image string, replicas int32) *appsv1.StatefulSet {
// Deep copy the template to avoid modifying the CR's spec
spec := cr.Spec.Template.DeepCopy()
spec.Replicas = &replicas
// IMPORTANT: Ensure the selector is set correctly
labels := map[string]string{
"app": cr.Name,
"controller": cr.Name,
}
spec.Selector = &metav1.LabelSelector{MatchLabels: labels}
spec.Template.ObjectMeta.Labels = labels
// Override the image
spec.Template.Spec.Containers[0].Image = image
sts := &appsv1.StatefulSet{
ObjectMeta: metav1.ObjectMeta{
Name: name,
Namespace: cr.Namespace,
},
Spec: *spec,
}
// Set the owner reference so that the Kubernetes garbage collector cleans up the StatefulSets
// when the CanaryStatefulSet is deleted.
ctrl.SetControllerReference(cr, sts, r.Scheme)
return sts
}
2.2. The Core Logic: Managing Two `StatefulSets`
The central idea is that the stable StatefulSet runs the old version, and the canary StatefulSet runs the new one. The total number of replicas is split between them.
StatefulSet exists, we create the stable one with the image defined in the CR's spec.template and scale it to spec.replicas.spec.image in the CR differs from the image running in the stable StatefulSet. This is our signal to start the canary process. * Create the canary StatefulSet using the new spec.image. Its initial replica count is 0.
* Scale the stable StatefulSet down from spec.replicas to spec.replicas - spec.canaryReplicas.
* Once the stable set has scaled down, scale the canary StatefulSet up to spec.canaryReplicas.
* This two-step scaling is critical. Scaling down first frees up the ordinal pod numbers (e.g., my-app-2) and their associated PVCs, which can then be claimed by the new canary pods. This ensures state is correctly transferred.
* Update the stable StatefulSet's pod template to use the new image.
* Scale the stable StatefulSet back up to spec.replicas.
* Scale the canary StatefulSet down to 0 and eventually delete it.
Here's a snippet illustrating the version detection and initial canary creation:
// Inside Reconcile function
// Determine current and target images
stableImage := ""
if stableExists {
stableImage = stableSts.Spec.Template.Spec.Containers[0].Image
}
targetImage := cr.Spec.Image
// --- Reconciliation Logic --- //
// Case 1: Initial creation
if !stableExists {
logger.Info("Creating a new stable StatefulSet", "name", stableStsName)
// Use the targetImage for the very first deployment
newStableSts := r.newStsForCR(cr, stableStsName, targetImage, cr.Spec.Replicas)
if err := r.Create(ctx, newStableSts); err != nil {
logger.Error(err, "Failed to create new stable StatefulSet")
return ctrl.Result{}, err
}
// Update status and requeue
cr.Status.Phase = "Initializing"
//... update status logic ...
return ctrl.Result{Requeue: true}, nil
}
// Case 2: No change in version, ensure stability
if stableImage == targetImage {
// Logic to ensure stable replicas match spec.replicas
// And ensure canary replicas are 0
if *stableSts.Spec.Replicas != cr.Spec.Replicas {
// Scale stable to correct size
}
if canaryExists && *canarySts.Spec.Replicas != 0 {
// Scale down and delete canary
}
cr.Status.Phase = "Stable"
//... update status logic ...
return ctrl.Result{}, nil
}
// Case 3: New version detected - start canary process
if stableImage != targetImage && canaryExists == false {
logger.Info("New image detected. Starting canary deployment.", "current", stableImage, "target", targetImage)
// Step 1: Scale down stable StatefulSet
desiredStableReplicas := cr.Spec.Replicas - cr.Spec.CanaryReplicas
if *stableSts.Spec.Replicas != desiredStableReplicas {
logger.Info("Scaling down stable set", "replicas", desiredStableReplicas)
stableSts.Spec.Replicas = &desiredStableReplicas
if err := r.Update(ctx, stableSts); err != nil {
// handle error
return ctrl.Result{}, err
}
cr.Status.Phase = "ScalingDownStable"
//... update status logic ...
return ctrl.Result{Requeue: true}, nil
}
// Wait for stable set to be ready at the new replica count
if stableSts.Status.ReadyReplicas != desiredStableReplicas {
logger.Info("Waiting for stable set to scale down", "current", stableSts.Status.ReadyReplicas, "desired", desiredStableReplicas)
return ctrl.Result{RequeueAfter: 5 * time.Second}, nil
}
// Step 2: Create and scale up canary StatefulSet
logger.Info("Creating canary StatefulSet", "name", canaryStsName)
newCanarySts := r.newStsForCR(cr, canaryStsName, targetImage, cr.Spec.CanaryReplicas)
if err := r.Create(ctx, newCanarySts); err != nil {
// handle error
return ctrl.Result{}, err
}
cr.Status.Phase = "DeployingCanary"
//... update status logic ...
return ctrl.Result{Requeue: true}, nil
}
// ... logic for monitoring canary, promoting, and rolling back ...
3. Advanced Considerations and Edge Case Handling
A production-ready operator must be resilient. Here are critical patterns to implement.
3.1. Finalizers for Graceful Deletion
What happens if a user runs kubectl delete canarystatefulset my-app? Without a finalizer, the custom resource is deleted immediately, and Kubernetes garbage collection will delete the owned StatefulSets. This is an abrupt, uncontrolled termination.
A finalizer allows our operator to intercept the deletion request and perform cleanup logic.
Implementation:
CanaryStatefulSet is created, add a finalizer string (e.g., app.example.com/finalizer) to its metadata.finalizers list.Reconcile loop, check if cr.ObjectMeta.DeletionTimestamp is non-nil. This indicates a deletion has been requested.StatefulSets to 0, wait for them to terminate, and then delete the StatefulSet objects themselves.- Once cleanup is complete, remove the finalizer string from the list. This signals to Kubernetes that the operator is finished, and the CR can now be deleted.
// Inside Reconcile function
const canaryFinalizer = "app.example.com/finalizer"
// Check if the object is being deleted
isMarkedForDeletion := cr.GetDeletionTimestamp() != nil
if isMarkedForDeletion {
if controllerutil.ContainsFinalizer(cr, canaryFinalizer) {
// Run our finalization logic
if err := r.finalizeCanaryStatefulSet(ctx, cr); err != nil {
// Don't remove the finalizer if cleanup fails, so we can retry
return ctrl.Result{}, err
}
// Remove finalizer. Once all finalizers are removed, the object will be deleted.
controllerutil.RemoveFinalizer(cr, canaryFinalizer)
err := r.Update(ctx, cr)
if err != nil {
return ctrl.Result{}, err
}
}
return ctrl.Result{}, nil
}
// Add finalizer for new objects
if !controllerutil.ContainsFinalizer(cr, canaryFinalizer) {
controllerutil.AddFinalizer(cr, canaryFinalizer)
err := r.Update(ctx, cr)
if err != nil {
return ctrl.Result{}, err
}
}
// The finalize function
func (r *CanaryStatefulSetReconciler) finalizeCanaryStatefulSet(ctx context.Context, cr *appv1alpha1.CanaryStatefulSet) error {
logger := log.FromContext(ctx)
logger.Info("Finalizing CanaryStatefulSet")
// Your cleanup logic here: delete owned StatefulSets, etc.
// This is a simplified example. A real implementation would need to be more robust,
// potentially scaling to 0 before deleting.
namespace := cr.Namespace
stableStsName := cr.Name + "-stable"
canaryStsName := cr.Name + "-canary"
// Delete both statefulsets
stsToDelete := []string{stableStsName, canaryStsName}
for _, name := range stsToDelete {
sts := &appsv1.StatefulSet{
ObjectMeta: metav1.ObjectMeta{
Name: name,
Namespace: namespace,
},
}
err := r.Delete(ctx, sts, client.PropagationPolicy(metav1.DeletePropagationBackground))
if err != nil && !errors.IsNotFound(err) {
logger.Error(err, "Failed to delete underlying StatefulSet during finalization", "StatefulSet", name)
return err
}
}
logger.Info("Successfully finalized CanaryStatefulSet")
return nil
}
3.2. Preventing Unnecessary Reconciliations with Predicates
The controller-runtime framework will trigger a reconciliation for any change to the watched resources. This includes status updates we make ourselves, leading to potential reconciliation loops. We can use predicate.Funcs to filter events.
For example, we don't need to reconcile if only the status subresource of a StatefulSet changes, unless that change affects the replica count. A more impactful optimization is to ignore changes to our own CR's status, as we are the only writer.
// In main.go, during controller setup
import "sigs.k8s.io/controller-runtime/pkg/predicate"
// ...
err = ctrl.NewControllerManagedBy(mgr).
For(&appv1alpha1.CanaryStatefulSet{}).
Owns(&appsv1.StatefulSet{}).
WithEventFilter(predicate.Funcs{
UpdateFunc: func(e event.UpdateEvent) bool {
// Ignore updates to CR status in which case metadata.Generation does not change
if e.ObjectOld.GetGeneration() != e.ObjectNew.GetGeneration() {
return true
}
// Additionally, you could compare specific fields if generation is not enough
return false
},
}).
Complete(&r)
// ...
This simple predicate prevents reconciliation when only the status is updated by our own controller, reducing load on the API server and preventing unnecessary work.
3.3. Handling Data Schema Migrations
This is the most complex real-world problem. If the new application version requires a database schema migration, simply rolling out the pods can lead to catastrophic data corruption. The operator must provide hooks to manage this.
A Production Pattern:
paused field in the spec or look for a specific annotation (e.g., canary.example.com/pause-reconciliation: "true").Job that runs the schema migration script. This job would target the database used by the stateful application.Job for successful completion.Job succeeds, the operator can resume the promotion, updating the stable StatefulSet to the new version that is compatible with the migrated schema.Job fails, the operator must initiate a rollback, deleting the canary StatefulSet and restoring the stable set to its original state.This logic requires a more sophisticated state machine within the operator, with phases like AwaitingMigration, MigratingData, and MigrationFailed.
4. Full Deployment Example
Let's walk through a deployment using our CanaryStatefulSet.
1. Apply the CRD and deploy the operator. (Assume this is done)
2. Create a CanaryStatefulSet resource:
# my-redis-canary.yaml
apiVersion: app.example.com/v1alpha1
kind: CanaryStatefulSet
metadata:
name: redis
namespace: default
spec:
replicas: 3
canaryReplicas: 1
image: redis:6.2.6 # Initial stable version
template:
# This is a standard StatefulSetSpec
serviceName: "redis-headless"
podManagementPolicy: Parallel
updateStrategy:
type: RollingUpdate
template:
spec:
terminationGracePeriodSeconds: 10
containers:
- name: redis
image: redis:6.2.6 # This will be managed by the operator
ports:
- containerPort: 6379
name: redis
volumeMounts:
- name: data
mountPath: /data
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 1Gi
Apply it: kubectl apply -f my-redis-canary.yaml
Observation:
The operator creates redis-stable.
kubectl get statefulset
NAME READY AGE
redis-stable 3/3 1m
kubectl get pods
NAME READY STATUS RESTARTS AGE
redis-stable-0 1/1 Running 0 50s
redis-stable-1 1/1 Running 0 45s
redis-stable-2 1/1 Running 0 40s
3. Initiate a Canary Rollout:
Update my-redis-canary.yaml to use a new image:
spec:
# ... other fields
image: redis:7.0.5 # New canary version
Apply the change: kubectl apply -f my-redis-canary.yaml
Observation:
* The operator detects the image change.
* It scales redis-stable down to 2 replicas. Pod redis-stable-2 will be terminated.
Once terminated, the operator creates redis-canary and scales it to 1 replica. A new pod, redis-canary-0, will be created. Crucially, because StatefulSets assign ordinals starting from 0, redis-canary-0 will try to claim the PVC from the highest* ordinal terminated pod, which is redis-stable-2. This is a simplification; a more robust operator might manage PVC affinity explicitly.
kubectl get pods
NAME READY STATUS RESTARTS AGE
redis-stable-0 1/1 Running 0 5m
redis-stable-1 1/1 Running 0 4m
redis-canary-0 1/1 Running 0 30s
Now you have a mixed-version cluster. redis-stable-0 and redis-stable-1 run Redis 6.2.6, while redis-canary-0 runs Redis 7.0.5.
4. Promote the Canary:
To promote, you would typically add an annotation if using a manual promotion strategy:
kubectl annotate canarystatefulset redis canary.example.com/promote=true
Observation:
* The operator updates the image on redis-stable to redis:7.0.5.
* It scales redis-stable back up to 3 replicas. The existing pods (-0, -1) will be updated via the RollingUpdate strategy.
* It scales redis-canary down to 0 and deletes it.
* The system returns to a stable state, now running the new version.
Conclusion
Automating canary deployments for StatefulSets requires moving beyond standard Kubernetes primitives and building a custom, application-aware controller. By managing a pair of StatefulSet resources, a custom operator can provide precise control over the rollout process, ensuring pod identity and persistent state are handled correctly. Implementing robust logic for promotion, rollbacks, finalizers, and data migration is non-trivial but essential for building a production-grade, resilient platform for stateful services. This pattern provides a powerful foundation for managing the lifecycle of complex distributed systems on Kubernetes with confidence.