StatefulSet Canary Deployments with a Custom Kubernetes Operator in Go

21 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Inherent Challenge: Why StatefulSet Canaries Defy Standard Tooling

Stateless Deployments in Kubernetes benefit from a rich ecosystem of canary deployment strategies facilitated by service meshes and ingress controllers. These tools manipulate traffic routing to direct a percentage of users to a new version. This model breaks down for StatefulSets. The core value of a StatefulSet—stable network identifiers (e.g., my-app-0, my-app-1) and persistent storage (PersistentVolumeClaim per Pod)—is also its greatest challenge for canary rollouts. You cannot simply route traffic away from my-app-0 to a canary version without breaking intra-cluster communication and risking state corruption.

The default OnDelete or RollingUpdate strategies for StatefulSets are insufficient. A RollingUpdate updates all pods sequentially, offering no control for a partial, canary-style rollout. An operator-driven approach is required to orchestrate the lifecycle of stateful pods with surgical precision.

This article details the implementation of a custom Go-based Kubernetes Operator that manages a CanaryStatefulSet custom resource. The core pattern involves orchestrating two underlying StatefulSet resources: one for the stable version and one for the canary. This allows us to introduce a new version to a subset of pods while maintaining the integrity of the stable cluster.


1. Designing the `CanaryStatefulSet` Custom Resource Definition (CRD)

The foundation of any operator is its API. Our CanaryStatefulSet CRD must capture the desired state for a complete canary lifecycle. It will abstract away the complexity of managing two separate StatefulSets.

Key fields in the spec:

* template: A StatefulSetSpec that defines the base configuration for our application. This is the source of truth for pod templates, volume claims, etc.

* replicas: The total number of desired replicas for the application when fully scaled.

* canaryReplicas: The number of replicas to be deployed as part of the canary version.

* image: The container image for the new (canary) version. The stable version's image will be inferred from the last successful spec.

* promotion: A strategy block, e.g., { type: "auto" } or { type: "manual", annotation: "canary.example.com/promote" }.

Key fields in the status:

* stableRevision: The controller-revision-hash of the stable StatefulSet.

* canaryRevision: The controller-revision-hash of the canary StatefulSet.

* stableReplicas: Current replica count of the stable set.

* canaryReplicas: Current replica count of the canary set.

* phase: The current state of the rollout (e.g., Initializing, ReconcilingStable, DeployingCanary, CanaryActive, Promoting, Stable).

Here is the complete CRD definition:

yaml
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: canarystatefulsets.app.example.com
spec:
  group: app.example.com
  names:
    kind: CanaryStatefulSet
    listKind: CanaryStatefulSetList
    plural: canarystatefulsets
    singular: canarystatefulset
  scope: Namespaced
  versions:
    - name: v1alpha1
      served: true
      storage: true
      schema:
        openAPIV3Schema:
          type: object
          properties:
            spec:
              type: object
              properties:
                replicas:
                  type: integer
                  minimum: 0
                canaryReplicas:
                  type: integer
                  minimum: 0
                image:
                  type: string
                template:
                  type: object
                  # This uses a preserverd unknown fields marker to allow any valid StatefulSetSpec
                  # In a real implementation, you'd generate a more specific schema
                  x-kubernetes-preserve-unknown-fields: true
              required:
                - replicas
                - canaryReplicas
                - image
                - template
            status:
              type: object
              properties:
                phase:
                  type: string
                stableRevision:
                  type: string
                canaryRevision:
                  type: string
                stableReplicas:
                  type: integer
                canaryReplicas:
                  type: integer
                readyReplicas:
                  type: integer

2. The Reconciliation Loop: Core Implementation in Go

We'll use the Kubebuilder framework to scaffold our operator. The heart of the operator is the Reconcile function in the controller. It's invoked whenever the CanaryStatefulSet resource or any of its owned resources change.

Our reconciliation logic follows a state machine based on the status.phase.

2.1. Initial Setup and Fetching Resources

The Reconcile function starts by fetching the CanaryStatefulSet instance. We also need to fetch the two underlying StatefulSets—we'll name them -stable and -canary.

go
package controllers

import (
	"context"
	"fmt"
	appsv1 "k8s.io/api/apps/v1"
	corev1 "k8s.io/api/core/v1"
	"k8s.io/apimachinery/pkg/api/errors"
	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
	"k8s.io/apimachinery/pkg/runtime"
	"k8s.io/apimachinery/pkg/types"
	ctrl "sigs.k8s.io/controller-runtime"
	"sigs.k8s.io/controller-runtime/pkg/client"
	"sigs.k8s.io/controller-runtime/pkg/log"
	appv1alpha1 "github.com/your-repo/canary-operator/api/v1alpha1"
)

// CanaryStatefulSetReconciler reconciles a CanaryStatefulSet object
type CanaryStatefulSetReconciler struct {
	client.Client
	Scheme *runtime.Scheme
}

func (r *CanaryStatefulSetReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
	logger := log.FromContext(ctx)

	// 1. Fetch the CanaryStatefulSet instance
	cr := &appv1alpha1.CanaryStatefulSet{}
	if err := r.Get(ctx, req.NamespacedName, cr); err != nil {
		if errors.IsNotFound(err) {
			logger.Info("CanaryStatefulSet resource not found. Ignoring since object must be deleted")
			return ctrl.Result{}, nil
		}
		logger.Error(err, "Failed to get CanaryStatefulSet")
		return ctrl.Result{}, err
	}

	// 2. Fetch the stable and canary StatefulSets
	stableSts := &appsv1.StatefulSet{}
	stableStsName := cr.Name + "-stable"
	stableExists := true
	if err := r.Get(ctx, types.NamespacedName{Name: stableStsName, Namespace: cr.Namespace}, stableSts); err != nil {
		if !errors.IsNotFound(err) {
			logger.Error(err, "Failed to get stable StatefulSet")
			return ctrl.Result{}, err
		}
		stableExists = false
	}

	canarySts := &appsv1.StatefulSet{}
	canaryStsName := cr.Name + "-canary"
	canaryExists := true
	if err := r.Get(ctx, types.NamespacedName{Name: canaryStsName, Namespace: cr.Namespace}, canarySts); err != nil {
		if !errors.IsNotFound(err) {
			logger.Error(err, "Failed to get canary StatefulSet")
			return ctrl.Result{}, err
		}
		canaryExists = false
	}

    // ... Reconciliation logic continues here ...
    return ctrl.Result{}, nil
}

// Helper to construct a new StatefulSet
func (r *CanaryStatefulSetReconciler) newStsForCR(cr *appv1alpha1.CanaryStatefulSet, name, image string, replicas int32) *appsv1.StatefulSet {
	// Deep copy the template to avoid modifying the CR's spec
	spec := cr.Spec.Template.DeepCopy()
	spec.Replicas = &replicas
	
    // IMPORTANT: Ensure the selector is set correctly
	labels := map[string]string{
		"app": cr.Name,
		"controller": cr.Name,
	}
	spec.Selector = &metav1.LabelSelector{MatchLabels: labels}
	spec.Template.ObjectMeta.Labels = labels

	// Override the image
	spec.Template.Spec.Containers[0].Image = image

	sts := &appsv1.StatefulSet{
		ObjectMeta: metav1.ObjectMeta{
			Name:      name,
			Namespace: cr.Namespace,
		},
		Spec: *spec,
	}

	// Set the owner reference so that the Kubernetes garbage collector cleans up the StatefulSets
	// when the CanaryStatefulSet is deleted.
	ctrl.SetControllerReference(cr, sts, r.Scheme)
	return sts
}

2.2. The Core Logic: Managing Two `StatefulSets`

The central idea is that the stable StatefulSet runs the old version, and the canary StatefulSet runs the new one. The total number of replicas is split between them.

  • Initial Deployment: If neither StatefulSet exists, we create the stable one with the image defined in the CR's spec.template and scale it to spec.replicas.
  • Detecting a New Version: A change is triggered when spec.image in the CR differs from the image running in the stable StatefulSet. This is our signal to start the canary process.
  • Canary Rollout:
  • * Create the canary StatefulSet using the new spec.image. Its initial replica count is 0.

    * Scale the stable StatefulSet down from spec.replicas to spec.replicas - spec.canaryReplicas.

    * Once the stable set has scaled down, scale the canary StatefulSet up to spec.canaryReplicas.

    * This two-step scaling is critical. Scaling down first frees up the ordinal pod numbers (e.g., my-app-2) and their associated PVCs, which can then be claimed by the new canary pods. This ensures state is correctly transferred.

  • Promotion:
  • * Update the stable StatefulSet's pod template to use the new image.

    * Scale the stable StatefulSet back up to spec.replicas.

    * Scale the canary StatefulSet down to 0 and eventually delete it.

    Here's a snippet illustrating the version detection and initial canary creation:

    go
    // Inside Reconcile function
    
    // Determine current and target images
    stableImage := ""
    if stableExists {
        stableImage = stableSts.Spec.Template.Spec.Containers[0].Image
    }
    targetImage := cr.Spec.Image
    
    // --- Reconciliation Logic --- //
    
    // Case 1: Initial creation
    if !stableExists {
        logger.Info("Creating a new stable StatefulSet", "name", stableStsName)
        // Use the targetImage for the very first deployment
        newStableSts := r.newStsForCR(cr, stableStsName, targetImage, cr.Spec.Replicas)
        if err := r.Create(ctx, newStableSts); err != nil {
            logger.Error(err, "Failed to create new stable StatefulSet")
            return ctrl.Result{}, err
        }
        // Update status and requeue
        cr.Status.Phase = "Initializing"
        //... update status logic ...
        return ctrl.Result{Requeue: true}, nil
    }
    
    // Case 2: No change in version, ensure stability
    if stableImage == targetImage {
        // Logic to ensure stable replicas match spec.replicas
        // And ensure canary replicas are 0
        if *stableSts.Spec.Replicas != cr.Spec.Replicas {
            // Scale stable to correct size
        }
        if canaryExists && *canarySts.Spec.Replicas != 0 {
            // Scale down and delete canary
        }
        cr.Status.Phase = "Stable"
        //... update status logic ...
        return ctrl.Result{}, nil
    }
    
    // Case 3: New version detected - start canary process
    if stableImage != targetImage && canaryExists == false {
        logger.Info("New image detected. Starting canary deployment.", "current", stableImage, "target", targetImage)
    
        // Step 1: Scale down stable StatefulSet
        desiredStableReplicas := cr.Spec.Replicas - cr.Spec.CanaryReplicas
        if *stableSts.Spec.Replicas != desiredStableReplicas {
            logger.Info("Scaling down stable set", "replicas", desiredStableReplicas)
            stableSts.Spec.Replicas = &desiredStableReplicas
            if err := r.Update(ctx, stableSts); err != nil {
                // handle error
                return ctrl.Result{}, err
            }
            cr.Status.Phase = "ScalingDownStable"
            //... update status logic ...
            return ctrl.Result{Requeue: true}, nil
        }
    
        // Wait for stable set to be ready at the new replica count
        if stableSts.Status.ReadyReplicas != desiredStableReplicas {
            logger.Info("Waiting for stable set to scale down", "current", stableSts.Status.ReadyReplicas, "desired", desiredStableReplicas)
            return ctrl.Result{RequeueAfter: 5 * time.Second}, nil
        }
    
        // Step 2: Create and scale up canary StatefulSet
        logger.Info("Creating canary StatefulSet", "name", canaryStsName)
        newCanarySts := r.newStsForCR(cr, canaryStsName, targetImage, cr.Spec.CanaryReplicas)
        if err := r.Create(ctx, newCanarySts); err != nil {
            // handle error
            return ctrl.Result{}, err
        }
        cr.Status.Phase = "DeployingCanary"
        //... update status logic ...
        return ctrl.Result{Requeue: true}, nil
    }
    
    // ... logic for monitoring canary, promoting, and rolling back ...

    3. Advanced Considerations and Edge Case Handling

    A production-ready operator must be resilient. Here are critical patterns to implement.

    3.1. Finalizers for Graceful Deletion

    What happens if a user runs kubectl delete canarystatefulset my-app? Without a finalizer, the custom resource is deleted immediately, and Kubernetes garbage collection will delete the owned StatefulSets. This is an abrupt, uncontrolled termination.

    A finalizer allows our operator to intercept the deletion request and perform cleanup logic.

    Implementation:

  • When a CanaryStatefulSet is created, add a finalizer string (e.g., app.example.com/finalizer) to its metadata.finalizers list.
  • In the Reconcile loop, check if cr.ObjectMeta.DeletionTimestamp is non-nil. This indicates a deletion has been requested.
  • Perform cleanup: scale both StatefulSets to 0, wait for them to terminate, and then delete the StatefulSet objects themselves.
    • Once cleanup is complete, remove the finalizer string from the list. This signals to Kubernetes that the operator is finished, and the CR can now be deleted.
    go
    // Inside Reconcile function
    
    const canaryFinalizer = "app.example.com/finalizer"
    
    // Check if the object is being deleted
    isMarkedForDeletion := cr.GetDeletionTimestamp() != nil
    if isMarkedForDeletion {
        if controllerutil.ContainsFinalizer(cr, canaryFinalizer) {
            // Run our finalization logic
            if err := r.finalizeCanaryStatefulSet(ctx, cr); err != nil {
                // Don't remove the finalizer if cleanup fails, so we can retry
                return ctrl.Result{}, err
            }
    
            // Remove finalizer. Once all finalizers are removed, the object will be deleted.
            controllerutil.RemoveFinalizer(cr, canaryFinalizer)
            err := r.Update(ctx, cr)
            if err != nil {
                return ctrl.Result{}, err
            }
        }
        return ctrl.Result{}, nil
    }
    
    // Add finalizer for new objects
    if !controllerutil.ContainsFinalizer(cr, canaryFinalizer) {
        controllerutil.AddFinalizer(cr, canaryFinalizer)
        err := r.Update(ctx, cr)
        if err != nil {
            return ctrl.Result{}, err
        }
    }
    
    // The finalize function
    func (r *CanaryStatefulSetReconciler) finalizeCanaryStatefulSet(ctx context.Context, cr *appv1alpha1.CanaryStatefulSet) error {
        logger := log.FromContext(ctx)
        logger.Info("Finalizing CanaryStatefulSet")
    
        // Your cleanup logic here: delete owned StatefulSets, etc.
        // This is a simplified example. A real implementation would need to be more robust,
        // potentially scaling to 0 before deleting.
        namespace := cr.Namespace
        stableStsName := cr.Name + "-stable"
        canaryStsName := cr.Name + "-canary"
    
        // Delete both statefulsets
        stsToDelete := []string{stableStsName, canaryStsName}
        for _, name := range stsToDelete {
            sts := &appsv1.StatefulSet{
                ObjectMeta: metav1.ObjectMeta{
                    Name:      name,
                    Namespace: namespace,
                },
            }
            err := r.Delete(ctx, sts, client.PropagationPolicy(metav1.DeletePropagationBackground))
            if err != nil && !errors.IsNotFound(err) {
                logger.Error(err, "Failed to delete underlying StatefulSet during finalization", "StatefulSet", name)
                return err
            }
        }
    
        logger.Info("Successfully finalized CanaryStatefulSet")
        return nil
    }

    3.2. Preventing Unnecessary Reconciliations with Predicates

    The controller-runtime framework will trigger a reconciliation for any change to the watched resources. This includes status updates we make ourselves, leading to potential reconciliation loops. We can use predicate.Funcs to filter events.

    For example, we don't need to reconcile if only the status subresource of a StatefulSet changes, unless that change affects the replica count. A more impactful optimization is to ignore changes to our own CR's status, as we are the only writer.

    go
    // In main.go, during controller setup
    
    import "sigs.k8s.io/controller-runtime/pkg/predicate"
    
    // ...
    err = ctrl.NewControllerManagedBy(mgr).
        For(&appv1alpha1.CanaryStatefulSet{}).
        Owns(&appsv1.StatefulSet{}).
        WithEventFilter(predicate.Funcs{
            UpdateFunc: func(e event.UpdateEvent) bool {
                // Ignore updates to CR status in which case metadata.Generation does not change
                if e.ObjectOld.GetGeneration() != e.ObjectNew.GetGeneration() {
                    return true
                }
                // Additionally, you could compare specific fields if generation is not enough
                return false
            },
        }).
        Complete(&r)
    // ...

    This simple predicate prevents reconciliation when only the status is updated by our own controller, reducing load on the API server and preventing unnecessary work.

    3.3. Handling Data Schema Migrations

    This is the most complex real-world problem. If the new application version requires a database schema migration, simply rolling out the pods can lead to catastrophic data corruption. The operator must provide hooks to manage this.

    A Production Pattern:

  • Pause Reconciliation: The operator can introduce a paused field in the spec or look for a specific annotation (e.g., canary.example.com/pause-reconciliation: "true").
  • Orchestrate Migration Job: When the canary is deployed but before it's promoted, the operator can create a Kubernetes Job that runs the schema migration script. This job would target the database used by the stateful application.
  • Wait for Completion: The operator pauses the promotion process, monitoring the Job for successful completion.
  • Resume Promotion: Once the Job succeeds, the operator can resume the promotion, updating the stable StatefulSet to the new version that is compatible with the migrated schema.
  • Rollback: If the migration Job fails, the operator must initiate a rollback, deleting the canary StatefulSet and restoring the stable set to its original state.
  • This logic requires a more sophisticated state machine within the operator, with phases like AwaitingMigration, MigratingData, and MigrationFailed.


    4. Full Deployment Example

    Let's walk through a deployment using our CanaryStatefulSet.

    1. Apply the CRD and deploy the operator. (Assume this is done)

    2. Create a CanaryStatefulSet resource:

    yaml
    # my-redis-canary.yaml
    apiVersion: app.example.com/v1alpha1
    kind: CanaryStatefulSet
    metadata:
      name: redis
      namespace: default
    spec:
      replicas: 3
      canaryReplicas: 1
      image: redis:6.2.6 # Initial stable version
      template:
        # This is a standard StatefulSetSpec
        serviceName: "redis-headless"
        podManagementPolicy: Parallel
        updateStrategy:
          type: RollingUpdate
        template:
          spec:
            terminationGracePeriodSeconds: 10
            containers:
            - name: redis
              image: redis:6.2.6 # This will be managed by the operator
              ports:
              - containerPort: 6379
                name: redis
              volumeMounts:
              - name: data
                mountPath: /data
        volumeClaimTemplates:
        - metadata:
            name: data
          spec:
            accessModes: [ "ReadWriteOnce" ]
            resources:
              requests:
                storage: 1Gi

    Apply it: kubectl apply -f my-redis-canary.yaml

    Observation:

    The operator creates redis-stable.

    kubectl get statefulset

    text
    NAME           READY   AGE
    redis-stable   3/3     1m

    kubectl get pods

    text
    NAME             READY   STATUS    RESTARTS   AGE
    redis-stable-0   1/1     Running   0          50s
    redis-stable-1   1/1     Running   0          45s
    redis-stable-2   1/1     Running   0          40s

    3. Initiate a Canary Rollout:

    Update my-redis-canary.yaml to use a new image:

    yaml
    spec:
      # ... other fields
      image: redis:7.0.5 # New canary version

    Apply the change: kubectl apply -f my-redis-canary.yaml

    Observation:

    * The operator detects the image change.

    * It scales redis-stable down to 2 replicas. Pod redis-stable-2 will be terminated.

    Once terminated, the operator creates redis-canary and scales it to 1 replica. A new pod, redis-canary-0, will be created. Crucially, because StatefulSets assign ordinals starting from 0, redis-canary-0 will try to claim the PVC from the highest* ordinal terminated pod, which is redis-stable-2. This is a simplification; a more robust operator might manage PVC affinity explicitly.

    kubectl get pods

    text
    NAME             READY   STATUS    RESTARTS   AGE
    redis-stable-0   1/1     Running   0          5m
    redis-stable-1   1/1     Running   0          4m
    redis-canary-0   1/1     Running   0          30s

    Now you have a mixed-version cluster. redis-stable-0 and redis-stable-1 run Redis 6.2.6, while redis-canary-0 runs Redis 7.0.5.

    4. Promote the Canary:

    To promote, you would typically add an annotation if using a manual promotion strategy:

    kubectl annotate canarystatefulset redis canary.example.com/promote=true

    Observation:

    * The operator updates the image on redis-stable to redis:7.0.5.

    * It scales redis-stable back up to 3 replicas. The existing pods (-0, -1) will be updated via the RollingUpdate strategy.

    * It scales redis-canary down to 0 and deletes it.

    * The system returns to a stable state, now running the new version.

    Conclusion

    Automating canary deployments for StatefulSets requires moving beyond standard Kubernetes primitives and building a custom, application-aware controller. By managing a pair of StatefulSet resources, a custom operator can provide precise control over the rollout process, ensuring pod identity and persistent state are handled correctly. Implementing robust logic for promotion, rollbacks, finalizers, and data migration is non-trivial but essential for building a production-grade, resilient platform for stateful services. This pattern provides a powerful foundation for managing the lifecycle of complex distributed systems on Kubernetes with confidence.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles