Crafting a Go Operator for Stateful Canary Deployments in K8s

13 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Impasse of Stateful Canaries

Stateless canary deployments are a solved problem. Tools like Flagger, Argo Rollouts, and native service mesh integrations provide sophisticated traffic shifting and analysis out of the box. But the moment persistent storage enters the picture, these elegant solutions often fall short. A StatefulSet is not a Deployment. The guarantees it provides—stable network identifiers, ordered scaling, and unique persistent volumes—are the very things that make canary deployments a complex, high-stakes endeavor.

The core challenges are not trivial:

  • Persistent State Divergence: How do you manage the data? If the canary instance writes to the same volume as the stable version, you risk data corruption. If it uses a separate volume, how do you handle state promotion upon a successful canary analysis? Direct PVC cloning via volume snapshots is an option, but it's storage-provider specific and can be slow.
  • Schema Migrations: A new version of a stateful application often implies a new database schema. The canary pod must be able to handle the existing schema, potentially performing a non-destructive, online migration. A rollback requires the old version to still understand any changes made by the canary.
  • Client Affinity: In many stateful systems, clients must be 'sticky' to a specific pod to maintain session state. A simple round-robin traffic split between stable and canary pods can break application logic.
  • Standard tooling avoids this complexity. Our goal is to embrace it by building a specialized Kubernetes Operator that provides a declarative API to manage this process. We will focus on creating an orchestration engine that manages the StatefulSet lifecycle and automates analysis, while acknowledging that data migration itself remains an application-level concern that the operator facilitates, but does not perform.

    Our operator will manage a new Custom Resource (CR), StatefulCanary, which will orchestrate the controlled rollout of changes to a StatefulSet.


    Architecting the `StatefulCanary` CRD

    A well-designed CRD is the foundation of any operator. It defines the declarative API our users will interact with. We need to capture the desired state for both the stable and canary versions, along with the rules for promotion.

    Let's define the Go types for our API. We'll use Kubebuilder markers to generate the CRD manifest.

    File: api/v1alpha1/statefulcanary_types.go

    go
    package v1alpha1
    
    import (
    	appsv1 "k8s.io/api/apps/v1"
    	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
    )
    
    // StatefulCanarySpec defines the desired state of StatefulCanary
    type StatefulCanarySpec struct {
    	// TotalReplicas is the total number of replicas for the application.
    	// +kubebuilder:validation:Minimum=0
    	TotalReplicas int32 `json:"totalReplicas"`
    
    	// CanaryReplicas is the number of replicas to allocate to the canary version.
    	// The rest will be allocated to the stable version.
    	// +kubebuilder:validation:Minimum=0
    	CanaryReplicas int32 `json:"canaryReplicas"`
    
    	// Template for the StatefulSet. The operator will manage two StatefulSets, 
    	// a 'stable' and a 'canary', both derived from this template.
    	Template appsv1.StatefulSetSpec `json:"template"`
    
    	// CanaryImage is the container image to use for the canary version.
    	// This will override the image in the first container of the template.
    	CanaryImage string `json:"canaryImage"`
    
    	// Analysis defines the automated analysis to perform before promotion.
    	// +optional
    	Analysis *CanaryAnalysis `json:"analysis,omitempty"`
    }
    
    // CanaryAnalysis defines the parameters for prometheus-based analysis
    type CanaryAnalysis struct {
    	// Interval is the time to wait between Prometheus queries.
    	Interval metav1.Duration `json:"interval"`
    
    	// Threshold is the maximum acceptable value for the query result.
    	Threshold float64 `json:"threshold"`
    
    	// MaxRetries is the number of times the query can fail the threshold before the canary is considered failed.
    	MaxRetries int `json:"maxRetries"`
    
    	// PrometheusQuery is the PromQL query to execute.
    	PrometheusQuery string `json:"prometheusQuery"`
    
    	// PrometheusAddress is the address of the Prometheus server.
    	PrometheusAddress string `json:"prometheusAddress"`
    }
    
    // CanaryPhase defines the current state of the canary process
    type CanaryPhase string
    
    const (
    	PhaseInitializing CanaryPhase = "Initializing"
    	PhaseProgressing  CanaryPhase = "Progressing"
    	PhaseAnalyzing    CanaryPhase = "Analyzing"
    	PhasePromoting    CanaryPhase = "Promoting"
    	PhaseSucceeded    CanaryPhase = "Succeeded"
    	PhaseFailed       CanaryPhase = "Failed"
    )
    
    // StatefulCanaryStatus defines the observed state of StatefulCanary
    type StatefulCanaryStatus struct {
    	// CurrentPhase indicates the current stage of the canary process.
    	CurrentPhase CanaryPhase `json:"currentPhase,omitempty"`
    
    	// StableReplicas is the current number of replicas for the stable set.
    	StableReplicas int32 `json:"stableReplicas"`
    
    	// CanaryReplicas is the current number of replicas for the canary set.
    	CanaryReplicas int32 `json:"canaryReplicas"`
    
    	// AnalysisRetries is the count of failed analysis checks.
    	AnalysisRetries int `json:"analysisRetries,omitempty"`
    
    	// LastTransitionTime is the last time the phase was updated.
    	LastTransitionTime metav1.Time `json:"lastTransitionTime,omitempty"`
    
    	// Message provides more details about the current status.
    	Message string `json:"message,omitempty"`
    }
    
    //+kubebuilder:object:root=true
    //+kubebuilder:subresource:status
    
    // StatefulCanary is the Schema for the statefulcanaries API
    type StatefulCanary struct {
    	metav1.TypeMeta   `json:",inline"`
    	metav1.ObjectMeta `json:"metadata,omitempty"`
    
    	Spec   StatefulCanarySpec   `json:"spec,omitempty"`
    	Status StatefulCanaryStatus `json:"status,omitempty"`
    }
    
    //+kubebuilder:object:root=true
    
    // StatefulCanaryList contains a list of StatefulCanary
    type StatefulCanaryList struct {
    	metav1.TypeMeta `json:",inline"`
    	metav1.ListMeta `json:"metadata,omitempty"`
    	Items           []StatefulCanary `json:"items"`
    }
    
    func init() {
    	SchemeBuilder.Register(&StatefulCanary{}, &StatefulCanaryList{})
    }

    This CRD design makes a key decision: instead of two full StatefulSet templates, we use one base template and a canaryImage override. This simplifies the user experience and focuses the CR on its core purpose: managing the version rollout. The operator will create two StatefulSet resources: my-app-stable and my-app-canary.


    The Core Reconciliation Loop

    The Reconcile function is the operator's brain. It's a state machine driven by the StatefulCanary resource's spec and status.

    Our reconciliation logic will follow these phases:

  • Initializing: Create the stable and canary StatefulSet resources if they don't exist. Set replica counts based on the spec.
  • Progressing: Monitor the StatefulSets until their pods are ready and match the desired replica counts.
  • Analyzing: If an analysis block is defined, begin querying Prometheus. Increment success or failure counters.
  • Promoting: If analysis succeeds, update the base image in the StatefulCanary spec to the canaryImage, set canaryReplicas to 0, and let the stable StatefulSet roll out the change.
  • Succeeded: The promotion is complete. The operator is now idle for this resource.
  • Failed: If analysis fails, scale the canary StatefulSet down to 0 and halt.
  • Here's a skeleton of the Reconcile function in controllers/statefulcanary_controller.go.

    go
    func (r *StatefulCanaryReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    	log := r.Log.WithValues("statefulcanary", req.NamespacedName)
    
    	// 1. Fetch the StatefulCanary instance
    	var sc v1alpha1.StatefulCanary
    	if err := r.Get(ctx, req.NamespacedName, &sc); err != nil {
    		return ctrl.Result{}, client.IgnoreNotFound(err)
    	}
    
    	// >> Add Finalizer logic here <<
    
    	// 2. Main reconciliation state machine
    	switch sc.Status.CurrentPhase {
    	case "":
    		return r.reconcileInitializing(ctx, &sc)
    	case v1alpha1.PhaseInitializing:
    		return r.reconcileInitializing(ctx, &sc)
    	case v1alpha1.PhaseProgressing:
    		return r.reconcileProgressing(ctx, &sc)
    	case v1alpha1.PhaseAnalyzing:
    		return r.reconcileAnalyzing(ctx, &sc)
    	case v1alpha1.PhasePromoting:
    		return r.reconcilePromoting(ctx, &sc)
    	case v1alpha1.PhaseSucceeded, v1alpha1.PhaseFailed:
    		// Terminal state, do nothing.
    		return ctrl.Result{}, nil
    	default:
    		log.Info("Unknown phase", "phase", sc.Status.CurrentPhase)
    		return ctrl.Result{}, nil
    	}
    }

    Implementing the `Initializing` Phase

    This phase ensures the underlying StatefulSet resources exist and have the correct specifications.

    go
    func (r *StatefulCanaryReconciler) reconcileInitializing(ctx context.Context, sc *v1alpha1.StatefulCanary) (ctrl.Result, error) {
    	log := r.Log.WithValues("statefulcanary", sc.GetNamespacedName())
    
    	// Create or Update Stable StatefulSet
    	stableReplicas := sc.Spec.TotalReplicas - sc.Spec.CanaryReplicas
    	stableSet, err := r.createOrUpdateStatefulSet(ctx, sc, "stable", sc.Spec.Template.Template.Spec.Containers[0].Image, stableReplicas)
    	if err != nil {
    		log.Error(err, "Failed to create/update stable StatefulSet")
    		return ctrl.Result{}, err
    	}
    
    	// Create or Update Canary StatefulSet
    	canarySet, err := r.createOrUpdateStatefulSet(ctx, sc, "canary", sc.Spec.CanaryImage, sc.Spec.CanaryReplicas)
    	if err != nil {
    		log.Error(err, "Failed to create/update canary StatefulSet")
    		return ctrl.Result{}, err
    	}
    
    	log.Info("StatefulSets reconciled", "stableReplicas", stableSet.Spec.Replicas, "canaryReplicas", canarySet.Spec.Replicas)
    
    	// Transition to Progressing phase
    	return r.updatePhase(ctx, sc, v1alpha1.PhaseProgressing)
    }
    
    // createOrUpdateStatefulSet is a helper function to manage StatefulSet resources.
    func (r *StatefulCanaryReconciler) createOrUpdateStatefulSet(ctx context.Context, sc *v1alpha1.StatefulCanary, role, image string, replicas int32) (*appsv1.StatefulSet, error) {
    	sts := &appsv1.StatefulSet{
    		ObjectMeta: metav1.ObjectMeta{
    			Name:      fmt.Sprintf("%s-%s", sc.Name, role),
    			Namespace: sc.Namespace,
    		},
    	}
    
    	// Use controller-runtime's CreateOrUpdate for idempotency
    	_, err := controllerutil.CreateOrUpdate(ctx, r.Client, sts, func() error {
    		// Define the desired state
    		desiredSpec := sc.Spec.Template
    		desiredSpec.Replicas = &replicas
    		desiredSpec.Template.Spec.Containers[0].Image = image
    		
    		// Add a label to distinguish stable/canary pods
    		if desiredSpec.Selector == nil {
    			desiredSpec.Selector = &metav1.LabelSelector{MatchLabels: map[string]string{}}
    		}
    		desiredSpec.Selector.MatchLabels["app.kubernetes.io/name"] = sc.Name
    		desiredSpec.Template.ObjectMeta.Labels = desiredSpec.Selector.MatchLabels
    		desiredSpec.Template.ObjectMeta.Labels["app.kubernetes.io/role"] = role
    
    		sts.Spec = desiredSpec
    
    		// Set the StatefulCanary as the owner of this StatefulSet
    		return controllerutil.SetControllerReference(sc, sts, r.Scheme)
    	})
    
    	return sts, err
    }

    Key Production Pattern: We use controllerutil.CreateOrUpdate. This is critical for idempotency. The reconciliation loop might run multiple times. This function fetches the current object, and if it differs from our desired state, it issues a Patch request. This is far more efficient than blind Create and Update calls.

    Monitoring in the `Progressing` Phase

    After creating the StatefulSets, we must wait for them to become ready. We check their .status.readyReplicas field.

    go
    func (r *StatefulCanaryReconciler) reconcileProgressing(ctx context.Context, sc *v1alpha1.StatefulCanary) (ctrl.Result, error) {
        log := r.Log.WithValues("statefulcanary", sc.GetNamespacedName())
    
        // Check stable StatefulSet
        stableReady, err := r.isStatefulSetReady(ctx, sc, "stable", sc.Spec.TotalReplicas - sc.Spec.CanaryReplicas)
        if err != nil {
            return ctrl.Result{}, err
        }
    
        // Check canary StatefulSet
        canaryReady, err := r.isStatefulSetReady(ctx, sc, "canary", sc.Spec.CanaryReplicas)
        if err != nil {
            return ctrl.Result{}, err
        }
    
        if stableReady && canaryReady {
            log.Info("Both StatefulSets are ready.")
            if sc.Spec.Analysis != nil {
                // We have an analysis step, move to Analyzing
                return r.updatePhase(ctx, sc, v1alpha1.PhaseAnalyzing)
            } else {
                // No analysis, move directly to Promoting
                return r.updatePhase(ctx, sc, v1alpha1.PhasePromoting)
            }
        }
    
        log.Info("Waiting for StatefulSets to become ready...", "stableReady", stableReady, "canaryReady", canaryReady)
        // Requeue to check again after a short delay
        return ctrl.Result{RequeueAfter: 15 * time.Second}, nil
    }
    
    func (r *StatefulCanaryReconciler) isStatefulSetReady(ctx context.Context, sc *v1alpha1.StatefulCanary, role string, expectedReplicas int32) (bool, error) {
        if expectedReplicas == 0 {
            return true, nil // Nothing to check
        }
        sts := &appsv1.StatefulSet{}
        err := r.Get(ctx, types.NamespacedName{Name: fmt.Sprintf("%s-%s", sc.Name, role), Namespace: sc.Namespace}, sts)
        if err != nil {
            return false, client.IgnoreNotFound(err)
        }
        // Check if the observed generation matches the desired generation, and ready replicas match the spec
        return sts.Status.ObservedGeneration == sts.Generation && sts.Status.ReadyReplicas == expectedReplicas, nil
    }

    Performance Consideration: The RequeueAfter is important. Without it, a failure to meet the ready condition would result in an immediate, frenetic requeue, burning CPU. A timed requeue provides a necessary cool-down.


    Advanced Integration: Automated Prometheus Analysis

    This is where our operator provides significant value. It automates the decision-making process based on real-time metrics.

    go
    import (
    	"github.com/prometheus/client_golang/api"
    	prometheusv1 "github.com/prometheus/client_golang/api/prometheus/v1"
    	"github.com/prometheus/common/model"
    )
    
    func (r *StatefulCanaryReconciler) reconcileAnalyzing(ctx context.Context, sc *v1alpha1.StatefulCanary) (ctrl.Result, error) {
    	log := r.Log.WithValues("statefulcanary", sc.GetNamespacedName())
    
    	// Initialize Prometheus client
    	client, err := api.NewClient(api.Config{
    		Address: sc.Spec.Analysis.PrometheusAddress,
    	})
    	if err != nil {
    		log.Error(err, "Error creating Prometheus client")
    		// This is a configuration error, likely won't resolve on its own. Fail the canary.
    		sc.Status.Message = fmt.Sprintf("Prometheus client error: %v", err)
    		return r.updatePhase(ctx, sc, v1alpha1.PhaseFailed)
    	}
    
    	v1api := prometheusv1.NewAPI(client)
    	queryResult, warnings, err := v1api.Query(ctx, sc.Spec.Analysis.PrometheusQuery, time.Now())
    	if err != nil {
    		log.Error(err, "Error querying Prometheus")
    		// Query might fail due to transient network issues. Requeue.
    		return ctrl.Result{RequeueAfter: sc.Spec.Analysis.Interval.Duration}, nil
    	}
    	if len(warnings) > 0 {
    		log.Info("Prometheus query warnings", "warnings", warnings)
    	}
    
    	// We expect a vector result with a single value
    	vector, ok := queryResult.(model.Vector)
    	if !ok || vector.Len() == 0 {
    		log.Info("Prometheus query returned no data. Retrying.")
    		return ctrl.Result{RequeueAfter: sc.Spec.Analysis.Interval.Duration}, nil
    	}
    
    	value := float64(vector[0].Value)
    	log.Info("Analysis query result", "value", value, "threshold", sc.Spec.Analysis.Threshold)
    
    	if value > sc.Spec.Analysis.Threshold {
    		// Metric exceeds threshold, increment failure count
    		sc.Status.AnalysisRetries++
    		if sc.Status.AnalysisRetries >= sc.Spec.Analysis.MaxRetries {
    			log.Info("Analysis failed. Max retries exceeded.")
    			sc.Status.Message = fmt.Sprintf("Analysis failed after %d retries. Metric value %f > threshold %f", sc.Status.AnalysisRetries, value, sc.Spec.Analysis.Threshold)
    			return r.updatePhase(ctx, sc, v1alpha1.PhaseFailed)
    		} else {
    			log.Info("Analysis metric exceeded threshold. Retrying.", "retries", sc.Status.AnalysisRetries)
    			_ = r.Status().Update(ctx, sc) // Update retry count
    			return ctrl.Result{RequeueAfter: sc.Spec.Analysis.Interval.Duration}, nil
    		}
    	} else {
    		// Metric is within threshold. Promote.
    		log.Info("Analysis successful. Promoting canary.")
    		return r.updatePhase(ctx, sc, v1alpha1.PhasePromoting)
    	}
    }

    Real-world Example Query: For an application with a StatefulSet named my-kv-store, a user might provide this PromQL query in the StatefulCanary spec:

    sum(rate(http_server_requests_seconds_count{app_kubernetes_io_name="my-kv-store", app_kubernetes_io_role="canary", outcome!="SUCCESS"}[1m])) / sum(rate(http_server_requests_seconds_count{app_kubernetes_io_name="my-kv-store", app_kubernetes_io_role="canary"}[1m])) > 0.01

    This query calculates the error rate for the canary pods over the last minute. Our operator simply executes this string and compares the scalar result to the threshold.

    Promotion and Rollback Logic

    Promotion (reconcilePromoting):

    This is a delicate operation. The goal is to make the stable StatefulSet identical to the canary version that just passed analysis.

  • Fetch the StatefulCanary resource.
  • Update its spec.template to use the canaryImage.
  • Set spec.canaryReplicas to 0 and spec.totalReplicas to the original full amount.
  • Call reconcileInitializing again. This will trigger the update logic:
  • * The canary StatefulSet will be scaled to 0.

    * The stable StatefulSet's template will be updated, causing Kubernetes to perform a rolling update of its pods to the new image.

  • Transition to PhaseSucceeded once the stable set is fully updated.
  • Rollback (triggered by PhaseFailed):

    This is simpler and designed for safety.

  • In the reconcileFailed function, scale the canary StatefulSet replicas down to 0.
  • Do not touch the stable StatefulSet. It continues running the last known-good version.
    • Update the status with a failure message. The process stops here, requiring manual intervention.

    Edge Case Handling: Finalizers for Graceful Deletion

    What happens if a user runs kubectl delete statefulcanary my-app while a canary is active? Without a finalizer, the StatefulCanary object is deleted, but the StatefulSets it created are orphaned. They will continue to run indefinitely.

    A finalizer is a list of strings on an object's metadata that tells the Kubernetes API server to wait until a controller has cleared the finalizer before actually deleting the object.

    1. Add the finalizer when the resource is first seen:

    go
    // In Reconcile function, right after fetching the object
    const statefulCanaryFinalizer = "canary.my.domain/finalizer"
    
    if sc.ObjectMeta.DeletionTimestamp.IsZero() {
    	// The object is not being deleted, so if it does not have our finalizer,
    	// then lets add it and update the object.
    	if !controllerutil.ContainsFinalizer(&sc, statefulCanaryFinalizer) {
    		controllerutil.AddFinalizer(&sc, statefulCanaryFinalizer)
    		if err := r.Update(ctx, &sc); err != nil {
    			return ctrl.Result{}, err
    		}
    	}
    } else {
    	// The object is being deleted
    	if controllerutil.ContainsFinalizer(&sc, statefulCanaryFinalizer) {
    		// Our finalizer is present, so lets handle any external dependency
    		if err := r.cleanupOwnedResources(ctx, &sc); err != nil {
    			// If fail to delete the external dependency here, return with error
    			// so that it can be retried.
    			return ctrl.Result{}, err
    		}
    
    		// Remove our finalizer from the list and update it.
    		controllerutil.RemoveFinalizer(&sc, statefulCanaryFinalizer)
    		if err := r.Update(ctx, &sc); err != nil {
    			return ctrl.Result{}, err
    		}
    	}
    	// Stop reconciliation as the item is being deleted
    	return ctrl.Result{}, nil
    }

    2. Implement the cleanup logic:

    go
    func (r *StatefulCanaryReconciler) cleanupOwnedResources(ctx context.Context, sc *v1alpha1.StatefulCanary) error {
    	log := r.Log.WithValues("statefulcanary", sc.GetNamespacedName())
    
    	// Delete the canary StatefulSet. We can rely on garbage collection for the stable one
    	// if ownership is set correctly, but explicit deletion is safer.
    	canarySts := &appsv1.StatefulSet{
    		ObjectMeta: metav1.ObjectMeta{
    			Name:      fmt.Sprintf("%s-canary", sc.Name),
    			Namespace: sc.Namespace,
    		},
    	}
    	if err := r.Delete(ctx, canarySts); err != nil && !apierrors.IsNotFound(err) {
    		log.Error(err, "Failed to delete canary StatefulSet")
    		return err
    	}
    
    	log.Info("Successfully cleaned up canary resources.")
    	return nil
    }

    This pattern ensures that your operator has a chance to perform cleanup actions before its controlling resource is removed from the cluster, preventing orphaned resources and maintaining a clean state.

    Conclusion: Declarative Control over Complex State

    We have designed and outlined the implementation of a sophisticated Kubernetes Operator to solve a genuinely hard problem: automated canary deployments for stateful applications. By creating a declarative API (StatefulCanary CRD) and an intelligent reconciliation loop, we abstract away immense complexity from the end-user.

    The key takeaways for building such a production-grade controller are:

    * Idempotent Actions: The reconciliation loop must be able to run multiple times and converge on the same state. Use tools like controllerutil.CreateOrUpdate.

    * State Machine Design: Clearly define the phases of your process (Initializing, Analyzing, etc.) and manage transitions carefully within the Status subresource.

    * Graceful Error Handling: Differentiate between errors that require a retry with backoff (RequeueAfter) and those that represent a terminal failure.

    * Lifecycle Management: Use finalizers to ensure your operator cleans up after itself, preventing orphaned resources.

    * Clear Boundaries: Acknowledge the operator's scope. Our operator orchestrates the StatefulSets and the analysis process; it does not handle the internal data migration logic of the application itself. This separation of concerns is vital for a robust system.

    This operator is not a simple wrapper. It's a stateful controller that actively manages the lifecycle of other stateful components, integrates with external monitoring systems, and makes automated decisions. It represents the true power of the operator pattern: extending the Kubernetes API to manage complex, domain-specific workflows as native, first-class citizens of the cluster.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles