Advanced Operator Patterns: StatefulSet Management with Finalizers

18 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Fragility of Default Deletion in Stateful Systems

In the world of Kubernetes, the default controller patterns excel at managing stateless applications. A Deployment can be deleted, and its ReplicaSet and Pods are garbage collected with minimal consequence. The system is designed for ephemeral workloads. However, when managing stateful applications like databases, caches, or message queues with an Operator, this default behavior is not just insufficient—it's dangerous. A naive kubectl delete my-database could trigger a cascading deletion that instantly terminates pods, potentially corrupting data, orphaning Persistent Volume Claims (PVCs), and leaving the cluster in an inconsistent state.

The core issue is that Kubernetes's garbage collection is unaware of the application-specific logic required for a graceful shutdown. It doesn't know it needs to flush a write-ahead log, quiesce connections, take a final backup, or deregister from a discovery service. This is where the Operator pattern must evolve beyond simple resource creation and reconciliation.

This article dissects an advanced, production-critical pattern: using Finalizers to intercept the deletion process of a Custom Resource (CR) and inject stateful, application-aware cleanup logic. We will build an operator for a fictional ShardDB database, demonstrating how to manage its underlying StatefulSet's lifecycle with precision, ensuring data safety and resource hygiene.

The Scenario: Managing `ShardDB`

Imagine a ShardDB Custom Resource Definition (CRD) that provisions a distributed database. The Operator's basic reconciliation loop creates a StatefulSet and a Service. When a user deletes the ShardDB CR, our goal is to prevent immediate resource deletion and instead execute a controlled shutdown sequence:

  • Initiate a backup for each pod's data.
  • Gracefully scale the StatefulSet down to zero replicas, one pod at a time.
    • Verify that all backup jobs are complete.
    • Clean up any external resources (e.g., monitoring dashboards, DNS entries).
  • Only then, allow Kubernetes to delete the StatefulSet, Service, and the CR itself.
  • This controlled demolition is impossible without a mechanism to pause Kubernetes's garbage collector. That mechanism is the Finalizer.

    The Finalizer Mechanism: An Operator's Deletion Hook

    A Finalizer is simply a string added to an object's metadata.finalizers list. When a user requests to delete an object that has finalizers, Kubernetes does not immediately delete it. Instead, it updates the object's metadata.deletionTimestamp to the current time and leaves the object in a Terminating state.

    The object will remain in this state, fully accessible via the API, until its metadata.finalizers list is empty. It is the responsibility of the controller (our Operator) that added the finalizer to perform its cleanup logic and then remove its own finalizer from the list. Once the list is empty and the deletionTimestamp is set, Kubernetes completes the deletion.

    This provides the exact hook we need. Our Operator's reconciliation loop will now have two primary paths:

    * Reconciliation Path: If deletionTimestamp is nil, execute the normal logic: ensure the StatefulSet exists, matches the spec, and update the status.

    * Finalization Path: If deletionTimestamp is not nil, execute the cleanup logic. Once complete, remove the finalizer.

    Let's implement this pattern.

    Initial Operator Setup and CRD Definition

    We assume you have a working Go environment and have initialized an operator project using operator-sdk init. We'll define the API for our ShardDB resource.

    api/v1/sharddb_types.go

    This file defines the schema for our ShardDB CR. Note the detailed Spec and, crucially, the Status subresource. The Status will be essential for making our cleanup logic idempotent and fault-tolerant.

    go
    package v1
    
    import (
    	appsv1 "k8s.io/api/apps/v1"
    	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
    )
    
    // ShardDBSpec defines the desired state of ShardDB
    type ShardDBSpec struct {
    	// Number of desired pods. Defaults to 3.
    	// +kubebuilder:validation:Minimum=1
    	// +kubebuilder:default=3
    	Replicas *int32 `json:"replicas"`
    
    	// The database version. Example: "14.2"
    	Version string `json:"version"`
    
    	// StorageClassName for the PersistentVolumeClaims.
    	StorageClassName string `json:"storageClassName"`
    
    	// Volume size for each replica. Example: "10Gi"
    	VolumeSize string `json:"volumeSize"`
    }
    
    // ShardDBStatus defines the observed state of ShardDB
    type ShardDBStatus struct {
    	// The current state of the database cluster. Can be Creating, Ready, Deleting, Failed.
    	Phase string `json:"phase,omitempty"`
    
    	// Total number of non-terminated pods targeted by this deployment (their labels match the selector).
    	Replicas int32 `json:"replicas,omitempty"`
    
    	// Conditions represent the latest available observations of an object's state.
    	Conditions []metav1.Condition `json:"conditions,omitempty"`
    }
    
    //+kubebuilder:object:root=true
    //+kubebuilder:subresource:status
    //+kubebuilder:printcolumn:name="Replicas",type="integer",JSONPath=".spec.replicas"
    //+kubebuilder:printcolumn:name="Version",type="string",JSONPath=".spec.version"
    //+kubebuilder:printcolumn:name="Phase",type="string",JSONPath=".status.phase"
    //+kubebuilder:printcolumn:name="Age",type="date",JSONPath=".metadata.creationTimestamp"
    
    // ShardDB is the Schema for the sharddbs API
    type ShardDB struct {
    	metav1.TypeMeta   `json:",inline"`
    	metav1.ObjectMeta `json:"metadata,omitempty"`
    
    	Spec   ShardDBSpec   `json:"spec,omitempty"`
    	Status ShardDBStatus `json:"status,omitempty"`
    }
    
    //+kubebuilder:object:root=true
    
    // ShardDBList contains a list of ShardDB
    type ShardDBList struct {
    	metav1.TypeMeta `json:",inline"`
    	metav1.ListMeta `json:"metadata,omitempty"`
    	Items           []ShardDB `json:"items"`
    }
    
    func init() {
    	SchemeBuilder.Register(&ShardDB{}, &ShardDBList{})
    }

    After running make manifests and make install, this CRD is available in the cluster.

    Implementing the Finalizer Logic in the Reconciler

    Now we modify the core Reconcile function in controllers/sharddb_controller.go. The logic will be split based on the presence of the deletionTimestamp.

    First, let's define our finalizer name as a constant.

    go
    const shardDBFinalizer = "db.example.com/finalizer"

    Here is the skeleton of the updated Reconcile function:

    go
    // controllers/sharddb_controller.go
    
    func (r *ShardDBReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    	log := log.FromContext(ctx)
    
    	// 1. Fetch the ShardDB instance
    	instance := &dbv1.ShardDB{}
    	err := r.Get(ctx, req.NamespacedName, instance)
    	if err != nil {
    		if errors.IsNotFound(err) {
    			log.Info("ShardDB resource not found. Ignoring since object must be deleted")
    			return ctrl.Result{}, nil
    		}
    		log.Error(err, "Failed to get ShardDB")
    		return ctrl.Result{}, err
    	}
    
    	// 2. Check if the instance is marked for deletion
    	isMarkedForDeletion := instance.GetDeletionTimestamp() != nil
    	if isMarkedForDeletion {
    		if controllerutil.ContainsFinalizer(instance, shardDBFinalizer) {
    			// Run finalization logic. If it fails, requeue.
    			if err := r.finalizeShardDB(ctx, instance); err != nil {
    				// Don't remove finalizer if cleanup fails
    				return ctrl.Result{}, err
    			}
    
    			// Cleanup succeeded, remove the finalizer
    			controllerutil.RemoveFinalizer(instance, shardDBFinalizer)
    			err := r.Update(ctx, instance)
    			if err != nil {
    				return ctrl.Result{}, err
    			}
    		}
    		// Stop reconciliation as the item is being deleted
    		return ctrl.Result{}, nil
    	}
    
    	// 3. Add finalizer for this CR if it doesn't exist
    	if !controllerutil.ContainsFinalizer(instance, shardDBFinalizer) {
    		log.Info("Adding Finalizer for the ShardDB")
    		controllerutil.AddFinalizer(instance, shardDBFinalizer)
    		err = r.Update(ctx, instance)
    		if err != nil {
    			return ctrl.Result{}, err
    		}
    	}
    
    	// 4. Run regular reconciliation logic
        // ... (code to create/update StatefulSet, Service, etc.)
        // For brevity, this part is omitted but would contain standard operator logic.
    	log.Info("Running standard reconciliation for ShardDB")
    
    	return ctrl.Result{}, nil
    }
    
    // finalizeShardDB performs the actual cleanup logic.
    func (r *ShardDBReconciler) finalizeShardDB(ctx context.Context, db *dbv1.ShardDB) error {
        log := log.FromContext(ctx)
    	log.Info("Starting finalization for ShardDB")
    
        // Here we will implement our multi-step, idempotent cleanup process.
        // For now, we'll just log a message.
        log.Info("Simulating backup and resource cleanup...")
        time.Sleep(5 * time.Second) // Simulate long-running task
    
    	log.Info("Finalization for ShardDB completed successfully")
    	return nil
    }

    This structure correctly handles the finalizer lifecycle:

  • Add Finalizer: On the first reconciliation of a new CR, the finalizer is added. The Update call triggers a new reconciliation.
  • Deletion Check: On every subsequent run, we check for deletionTimestamp. If it's present, we divert to our finalizeShardDB function.
  • Finalization: The finalizeShardDB function contains our critical cleanup logic. We'll flesh this out next.
  • Remove Finalizer: Only after finalizeShardDB returns nil (success), we remove the finalizer and update the CR. This is the signal to Kubernetes to proceed with deletion.
  • Building an Idempotent, Multi-Stage Finalizer

    The simple finalizeShardDB above is not production-ready. An operator can crash and restart at any point. If our cleanup involves multiple steps (e.g., backup pod 0, then pod 1, then pod 2), we need to track our progress. The CR's Status subresource is the perfect place for this.

    Let's refine our ShardDBStatus and the finalizer logic to be robust.

    First, we add more detail to the status:

    go
    // api/v1/sharddb_types.go
    
    // ... (inside ShardDBStatus struct)
    type ShardDBStatus struct {
    	Phase string `json:"phase,omitempty"`
    	Replicas int32 `json:"replicas,omitempty"`
    
        // New fields for finalization tracking
        // +optional
        FinalizationStatus string `json:"finalizationStatus,omitempty"`
        // +optional
        LastBackupAttempt *metav1.Time `json:"lastBackupAttempt,omitempty"`
    
    	Conditions []metav1.Condition `json:"conditions,omitempty"`
    }

    Now, we build a more sophisticated finalizeShardDB function.

    go
    // controllers/sharddb_controller.go
    
    const (
        finalizationStateBackupsStarted = "BackupsStarted"
        finalizationStateScalingDown = "ScalingDown"
        finalizationStateComplete = "Complete"
    )
    
    func (r *ShardDBReconciler) finalizeShardDB(ctx context.Context, db *dbv1.ShardDB) error {
    	log := log.FromContext(ctx)
    
        // Update status to indicate deletion is in progress
        if db.Status.Phase != "Deleting" {
            db.Status.Phase = "Deleting"
            if err := r.Status().Update(ctx, db); err != nil {
                return err
            }
        }
    
        // Step 1: Perform Backups
        if db.Status.FinalizationStatus != finalizationStateBackupsStarted && 
           db.Status.FinalizationStatus != finalizationStateScalingDown &&
           db.Status.FinalizationStatus != finalizationStateComplete {
            log.Info("Starting backup finalization step")
            // This function would trigger backup jobs for each pod.
            // It should be idempotent.
            // For this example, we'll simulate it and update status.
            err := r.triggerBackups(ctx, db)
            if err != nil {
                log.Error(err, "Backup step failed")
                // You might want to update status with an error condition here
                return err // Requeue
            }
    
            db.Status.FinalizationStatus = finalizationStateBackupsStarted
            if err := r.Status().Update(ctx, db); err != nil {
                return err
            }
            // Requeue to check backup job status
            return fmt.Errorf("requeuing to check backup status")
        }
    
        // Step 2: Check backup status and scale down
        if db.Status.FinalizationStatus == finalizationStateBackupsStarted {
            log.Info("Checking backup job status")
            backupsComplete, err := r.areBackupsComplete(ctx, db)
            if err != nil {
                return err
            }
            if !backupsComplete {
                log.Info("Backups not yet complete, requeueing")
                // Using an error to requeue is a common pattern for polling
                return fmt.Errorf("requeuing: backups still in progress")
            }
    
            log.Info("Backups complete. Scaling down StatefulSet")
            db.Status.FinalizationStatus = finalizationStateScalingDown
            if err := r.Status().Update(ctx, db); err != nil {
                return err
            }
        }
    
        // Step 3: Perform scale down and wait for completion
        if db.Status.FinalizationStatus == finalizationStateScalingDown {
            sts := &appsv1.StatefulSet{}
            err := r.Get(ctx, types.NamespacedName{Name: db.Name, Namespace: db.Namespace}, sts)
            if err != nil && !errors.IsNotFound(err) {
                return err
            }
    
            if !errors.IsNotFound(err) {
                if sts.Spec.Replicas != nil && *sts.Spec.Replicas != 0 {
                    log.Info("Setting StatefulSet replicas to 0")
                    zeroReplicas := int32(0)
                    sts.Spec.Replicas = &zeroReplicas
                    if err := r.Update(ctx, sts); err != nil {
                        return err
                    }
                }
    
                if sts.Status.Replicas != 0 {
                    log.Info("Waiting for StatefulSet pods to terminate")
                    return fmt.Errorf("requeuing: waiting for pods to terminate")
                }
            }
            
            log.Info("StatefulSet scaled down successfully")
            db.Status.FinalizationStatus = finalizationStateComplete
            if err := r.Status().Update(ctx, db); err != nil {
                return err
            }
        }
    
        log.Info("All finalization steps complete.")
    	return nil // Success! The finalizer can now be removed.
    }
    
    // triggerBackups is a placeholder for actual backup logic
    func (r *ShardDBReconciler) triggerBackups(ctx context.Context, db *dbv1.ShardDB) error {
        log := log.FromContext(ctx)
        log.Info("Triggering backup jobs for all ShardDB pods... (simulation)")
        // In a real implementation, you would create Batchv1.Job objects for each PVC.
        return nil
    }
    
    // areBackupsComplete is a placeholder for checking backup status
    func (r *ShardDBReconciler) areBackupsComplete(ctx context.Context, db *dbv1.ShardDB) (bool, error) {
        log := log.FromContext(ctx)
        log.Info("Checking backup job status... (simulation)")
        // In a real implementation, you would list Jobs with a specific label selector
        // and check their .status.succeeded count.
        return true, nil // Simulate immediate success for the example
    }

    This implementation is far more robust:

    * State Machine: The FinalizationStatus field creates a simple state machine. If the operator crashes during the BackupsStarted phase, it will resume from that point on the next reconciliation, not from the beginning.

    * Idempotency: The logic checks the current state before acting. It won't re-trigger backups if they've already been started.

    * Polling via Requeue: Instead of blocking, we check the status of long-running operations (backups, pod termination) and return a temporary error (e.g., fmt.Errorf("requeuing...")). This tells controller-runtime to requeue the request, effectively polling without consuming a worker goroutine.

    * Status Updates: The status subresource is updated at each stage, providing excellent observability for users running kubectl describe sharddb .

    Edge Cases and Production Hardening

    Senior engineers know that handling the happy path is only half the battle. Here are critical edge cases to consider.

    Stuck Finalizers

    Problem: What happens if there's a bug in finalizeShardDB that causes it to return an error indefinitely? Or if the operator is down? The ShardDB CR will be stuck in the Terminating state forever, and kubectl delete will hang.

    Solution:

  • Monitoring and Alerting: Your operator must have metrics (e.g., Prometheus counters for finalizer_failures) and alerts. An alert should fire if a CR has been in a Terminating state for an excessive period (e.g., > 1 hour).
  • Robust Error Handling: Distinguish between transient errors (e.g., API server unavailable), which should be retried, and permanent errors (e.g., invalid configuration), which might require manual intervention or a status condition update.
  • Manual Override: As a last resort, an administrator can forcefully remove the finalizer:
  • bash
        kubectl patch sharddb <name> -p '{"metadata":{"finalizers":[]}}' --type=merge

    This is a destructive operation and should only be performed after manually verifying that the underlying resources have been cleaned up.

    Controller Concurrency and Leader Election

    Problem: The Operator SDK's manager can be configured with MaxConcurrentReconciles > 1. While the controller-runtime ensures that the Reconcile function for a single object instance (e.g., default/my-db) is never run concurrently, reconciliations for different objects (default/db1, default/db2) can run in parallel.

    If your finalizer logic interacts with a shared, external system (e.g., a central backup repository, a corporate asset database), you could face race conditions.

    Solution:

    * For actions scoped to a single CR, no extra locking is needed.

    * For actions that affect the entire cluster or an external system, you may need to implement your own locking mechanism. Kubernetes provides the coordination.k8s.io/v1 API (Leases) for this purpose. You can create a Lease object and have your finalizer logic attempt to acquire it before performing a global action.

    * The operator manager itself uses a Leader Election lease to ensure only one instance of the operator is active. This is sufficient for most cases, but be mindful of concurrency if you have multiple workers and shared external state.

    Finalizer Race Conditions

    Problem: What if a user edits the CR to remove the finalizer while your operator is in the middle of a long cleanup operation? The operator might finish its cleanup, try to remove its finalizer (which is already gone), and Kubernetes might have already deleted the object, leading to errors.

    Solution:

    * The controller-runtime client is designed to handle this. When you call r.Update(ctx, instance) or r.Status().Update(ctx, instance), it uses the object's resourceVersion for optimistic concurrency control. If another actor has modified the object since you fetched it, the update will fail with a conflict error. The controller-runtime manager will automatically requeue the request, causing your Reconcile function to run again with the latest version of the object. Your logic should be written to gracefully handle this re-execution.

    * Always re-fetch the object at the start of your reconciliation loop to ensure you're working with the most recent state.

    Complete Code Example

    Here is a more complete, production-oriented sharddb_controller.go to tie everything together.

    go
    package controllers
    
    import (
    	"context"
    	"fmt"
    	"time"
    
    	appsv1 "k8s.io/api/apps/v1"
    	"k8s.io/apimachinery/pkg/api/errors"
    	"k8s.io/apimachinery/pkg/runtime"
    	"k8s.io/apimachinery/pkg/types"
    	ctrl "sigs.k8s.io/controller-runtime"
    	"sigs.k8s.io/controller-runtime/pkg/client"
    	"sigs.k8s.io/controller-runtime/pkg/controller/controllerutil"
    	"sigs.k8s.io/controller-runtime/pkg/log"
    
    	dbv1 "my-operator/api/v1"
    )
    
    const (
    	shardDBFinalizer                  = "db.example.com/finalizer"
    	finalizationStateBackupsStarted = "BackupsStarted"
    	finalizationStateScalingDown    = "ScalingDown"
    	finalizationStateComplete       = "Complete"
    )
    
    // ShardDBReconciler reconciles a ShardDB object
    type ShardDBReconciler struct {
    	client.Client
    	Scheme *runtime.Scheme
    }
    
    //+kubebuilder:rbac:groups=db.example.com,resources=sharddbs,verbs=get;list;watch;create;update;patch;delete
    //+kubebuilder:rbac:groups=db.example.com,resources=sharddbs/status,verbs=get;update;patch
    //+kubebuilder:rbac:groups=db.example.com,resources=sharddbs/finalizers,verbs=update
    //+kubebuilder:rbac:groups=apps,resources=statefulsets,verbs=get;list;watch;create;update;patch;delete
    
    func (r *ShardDBReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    	log := log.FromContext(ctx)
    
    	instance := &dbv1.ShardDB{}
    	if err := r.Get(ctx, req.NamespacedName, instance); err != nil {
    		if errors.IsNotFound(err) {
    			return ctrl.Result{}, nil
    		}
    		return ctrl.Result{}, err
    	}
    
    	isMarkedForDeletion := instance.GetDeletionTimestamp() != nil
    	if isMarkedForDeletion {
    		if controllerutil.ContainsFinalizer(instance, shardDBFinalizer) {
    			log.Info("Handling deletion for ShardDB")
    			result, err := r.finalizeShardDB(ctx, instance)
    			if err != nil {
    				log.Error(err, "Finalization failed, requeueing")
    				return ctrl.Result{RequeueAfter: 5 * time.Second}, err
    			}
    			if result.Requeue {
    				return result, nil
    			}
    
    			log.Info("Finalization complete, removing finalizer")
    			controllerutil.RemoveFinalizer(instance, shardDBFinalizer)
    			if err := r.Update(ctx, instance); err != nil {
    				return ctrl.Result{}, err
    			}
    		}
    		return ctrl.Result{}, nil
    	}
    
    	if !controllerutil.ContainsFinalizer(instance, shardDBFinalizer) {
    		log.Info("Adding finalizer to ShardDB")
    		controllerutil.AddFinalizer(instance, shardDBFinalizer)
    		if err := r.Update(ctx, instance); err != nil {
    			return ctrl.Result{}, err
    		}
    	}
    
    	// Regular reconciliation logic goes here.
    	// Ensure StatefulSet exists and matches spec.
    	// Update status with current replica count, etc.
    
    	return ctrl.Result{}, nil
    }
    
    func (r *ShardDBReconciler) finalizeShardDB(ctx context.Context, db *dbv1.ShardDB) (ctrl.Result, error) {
    	log := log.FromContext(ctx)
    
    	if db.Status.Phase != "Deleting" {
    		db.Status.Phase = "Deleting"
    		if err := r.Status().Update(ctx, db); err != nil {
    			return ctrl.Result{}, err
    		}
    	}
    
    	switch db.Status.FinalizationStatus {
    	case "":
    		log.Info("Finalization step: Triggering backups")
    		if err := r.triggerBackups(ctx, db); err != nil {
    			return ctrl.Result{}, fmt.Errorf("failed to trigger backups: %w", err)
    		}
    		db.Status.FinalizationStatus = finalizationStateBackupsStarted
    		if err := r.Status().Update(ctx, db); err != nil {
    			return ctrl.Result{}, err
    		}
    		return ctrl.Result{Requeue: true, RequeueAfter: 15 * time.Second}, nil
    
    	case finalizationStateBackupsStarted:
    		log.Info("Finalization step: Checking backup status")
    		complete, err := r.areBackupsComplete(ctx, db)
    		if err != nil {
    			return ctrl.Result{}, fmt.Errorf("failed to check backup status: %w", err)
    		}
    		if !complete {
    			log.Info("Backups not yet complete")
    			return ctrl.Result{Requeue: true, RequeueAfter: 30 * time.Second}, nil
    		}
    		log.Info("Backups complete")
    		db.Status.FinalizationStatus = finalizationStateScalingDown
    		if err := r.Status().Update(ctx, db); err != nil {
    			return ctrl.Result{}, err
    		}
    		return ctrl.Result{Requeue: true}, nil
    
    	case finalizationStateScalingDown:
    		log.Info("Finalization step: Scaling down StatefulSet")
    		sts := &appsv1.StatefulSet{}
    		err := r.Get(ctx, types.NamespacedName{Name: db.Name, Namespace: db.Namespace}, sts)
    		if err != nil {
    			if errors.IsNotFound(err) {
    				log.Info("StatefulSet already deleted")
    				db.Status.FinalizationStatus = finalizationStateComplete
    				return ctrl.Result{}, r.Status().Update(ctx, db)
    			}
    			return ctrl.Result{}, err
    		}
    
    		if sts.Spec.Replicas != nil && *sts.Spec.Replicas > 0 {
    			log.Info("Setting StatefulSet replicas to 0")
    			zeroReplicas := int32(0)
    			sts.Spec.Replicas = &zeroReplicas
    			if err := r.Update(ctx, sts); err != nil {
    				return ctrl.Result{}, err
    			}
    		}
    
    		if sts.Status.ReadyReplicas > 0 {
    			log.Info("Waiting for StatefulSet pods to terminate", "ReadyReplicas", sts.Status.ReadyReplicas)
    			return ctrl.Result{Requeue: true, RequeueAfter: 15 * time.Second}, nil
    		}
    
    		log.Info("StatefulSet successfully scaled down")
    		db.Status.FinalizationStatus = finalizationStateComplete
    		if err := r.Status().Update(ctx, db); err != nil {
    			return ctrl.Result{}, err
    		}
    		return ctrl.Result{Requeue: true}, nil
    
    	case finalizationStateComplete:
    		log.Info("Finalization complete")
    		return ctrl.Result{}, nil
    
    	default:
    		return ctrl.Result{}, fmt.Errorf("unknown finalization state: %s", db.Status.FinalizationStatus)
    	}
    }
    
    func (r *ShardDBReconciler) triggerBackups(ctx context.Context, db *dbv1.ShardDB) error { return nil }
    func (r *ShardDBReconciler) areBackupsComplete(ctx context.Context, db *dbv1.ShardDB) (bool, error) { return true, nil }
    
    func (r *ShardDBReconciler) SetupWithManager(mgr ctrl.Manager) error {
    	return ctrl.NewControllerManagedBy(mgr).
    		For(&dbv1.ShardDB{}).
    		Owns(&appsv1.StatefulSet{}).
    		Complete(r)
    }
    

    Conclusion: From Controller to Guardian

    Implementing a finalizer transforms an Operator from a simple resource provisioner into a true guardian of your stateful application. It elevates the Operator's role to manage the full lifecycle, including the most critical and often overlooked phase: deletion. By intercepting the deletion process, leveraging the status subresource for idempotency, and carefully handling edge cases, you can build production-grade operators that provide the safety and reliability required for running critical stateful workloads on Kubernetes. This pattern is not just a best practice; for stateful systems, it is an absolute necessity.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles