Idempotent Reconcilers with Finalizers in K8s Operators

15 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Inevitable Problem: Orphaned Resources

As a seasoned engineer building on Kubernetes, you understand the power of the operator pattern. It extends the Kubernetes API to manage complex, stateful applications and, more importantly, external resources like cloud databases, message queues, or DNS entries. The core of an operator is its reconciliation loop: a continuous process that drives the current state of the world toward the desired state defined in a Custom Resource (CR).

However, a common and costly failure mode arises during deletion. Consider an operator managing CloudDatabase custom resources, where each CR corresponds to an AWS RDS instance. A user creates a CloudDatabase object, the operator's reconciler sees it, and calls the AWS API to provision an RDS instance.

Now, what happens when the user runs kubectl delete clouddatabase my-prod-db?

  • The Kubernetes API server receives the delete request and marks the object for deletion by setting its metadata.deletionTimestamp.
    • This update triggers the operator's reconciliation loop one last time.
    • A naive operator might immediately call the AWS API to terminate the RDS instance.

    This seems straightforward, but it's fraught with peril in a distributed system:

    Operator Crash: The operator pod could crash or be evicted after the CR is deleted from etcd but before* the AWS API call completes successfully.

    * Network Failure: The call to the AWS API might fail due to transient network issues.

    * API Rate Limiting: The cloud provider might rate-limit the operator's requests, delaying or preventing deletion.

    In all these cases, the CloudDatabase CR is gone from Kubernetes, but the expensive RDS instance is now an orphaned resource, silently accruing costs. The operator has lost its source of truth and has no way to know it needs to clean up this resource. This is where the Finalizer Pattern becomes not just a best practice, but a necessity for robust, production-grade operators.

    The Finalizer Pattern: A Deletion Gatekeeper

    A finalizer is not a piece of code or a controller; it's simply a string added to the metadata.finalizers list of a Kubernetes object. When this list is not empty, the Kubernetes garbage collector is blocked. An object with a non-empty finalizer list that has been marked for deletion will remain in the API server in a Terminating state indefinitely until its finalizers list is cleared.

    This behavior provides the hook we need. Our operator can add its own unique finalizer to any CloudDatabase CR it starts managing. Now, when a user deletes the CR:

  • The deletionTimestamp is set, but the object is not removed from etcd because our finalizer is present.
  • The operator's reconciler is triggered. It sees the deletionTimestamp and knows the object is being deleted.
    • It can now perform its cleanup logic (e.g., delete the RDS instance) with confidence.
  • Only after it has verified that the external resource is successfully deleted does it remove its finalizer from the CR.
    • With the finalizer list now empty, the Kubernetes garbage collector is unblocked and completes the deletion of the CR object.

    This guarantees that the operator has the opportunity to perform and confirm cleanup before its source of truth disappears.

    Architecting the Idempotent Reconciliation Loop

    Let's build this robust reconciliation loop in Go using the controller-runtime library, the de facto standard for building operators. Our Reconcile function will effectively become a state machine driven by the presence of the deletionTimestamp and our finalizer.

    First, let's define our finalizer's name. It should be unique and descriptive, typically using a domain-style name.

    go
    // api/v1alpha1/clouddatabase_types.go
    package v1alpha1
    
    // ... other imports
    
    const (
    	CloudDatabaseFinalizer = "database.example.com/finalizer"
    )
    
    // ... CRD struct definitions

    Our Reconcile function in the controller will be structured around this core logic:

    go
    // internal/controller/clouddatabase_controller.go
    
    import (
    	// ... other imports
    	"github.com/go-logr/logr"
    	ctrl "sigs.k8s.io/controller-runtime"
    	"sigs.k8s.io/controller-runtime/pkg/client"
    	"sigs.k8s.io/controller-runtime/pkg/controller/controllerutil"
    
    	databasev1alpha1 "github.com/your-org/your-operator/api/v1alpha1"
    )
    
    func (r *CloudDatabaseReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    	log := r.Log.WithValues("clouddatabase", req.NamespacedName)
    
    	// 1. Fetch the CloudDatabase instance
    	instance := &databasev1alpha1.CloudDatabase{}
    	if err := r.Get(ctx, req.NamespacedName, instance); err != nil {
    		if errors.IsNotFound(err) {
    			log.Info("CloudDatabase resource not found. Ignoring since object must be deleted.")
    			return ctrl.Result{}, nil
    		}
    		log.Error(err, "Failed to get CloudDatabase")
    		return ctrl.Result{}, err
    	}
    
    	// 2. The core state machine logic
    	if instance.GetDeletionTimestamp().IsZero() {
    		// The object is NOT being deleted, so we proceed with normal reconciliation.
    		return r.reconcileNormal(ctx, instance, log)
    	} else {
    		// The object IS being deleted.
    		return r.reconcileDelete(ctx, instance, log)
    	}
    }

    This structure cleanly separates the creation/update logic from the deletion logic.

    State 1: Normal Reconciliation (Create/Update)

    When deletionTimestamp is nil, our goal is to ensure the external resource exists and matches the spec.

    go
    func (r *CloudDatabaseReconciler) reconcileNormal(ctx context.Context, instance *databasev1alpha1.CloudDatabase, log logr.Logger) (ctrl.Result, error) {
    	// A. Ensure our finalizer is present on the object.
    	if !controllerutil.ContainsFinalizer(instance, databasev1alpha1.CloudDatabaseFinalizer) {
    		log.Info("Adding Finalizer for CloudDatabase")
    		controllerutil.AddFinalizer(instance, databasev1alpha1.CloudDatabaseFinalizer)
    		if err := r.Update(ctx, instance); err != nil {
    			log.Error(err, "Failed to update CloudDatabase to add finalizer")
    			return ctrl.Result{}, err
    		}
            // Requeue immediately after adding the finalizer to avoid race conditions.
    		return ctrl.Result{Requeue: true}, nil
    	}
    
    	// B. Check if the external RDS instance exists.
    	// We use a unique identifier, like `instance.UID`, to name or tag the external resource.
    	rdsInstance, err := r.RDSClient.DescribeDBInstances(instance.UID)
    	if err != nil {
    		if errors.Is(err, rds.ErrDBInstanceNotFound) {
    			// C. It doesn't exist, so create it.
    			log.Info("Creating a new RDS instance")
    			_, createErr := r.RDSClient.CreateDBInstance(instance.Spec.Engine, instance.Spec.Size, instance.UID)
    			if createErr != nil {
    				log.Error(createErr, "Failed to create RDS instance")
                    // Update status to reflect failure
                    instance.Status.Phase = "Failed"
                    instance.Status.Message = createErr.Error()
                    if updateErr := r.Status().Update(ctx, instance); updateErr != nil {
                        log.Error(updateErr, "Failed to update CloudDatabase status")
                    }
    				return ctrl.Result{}, createErr
    			}
                // Creation is asynchronous. We update our status and requeue.
                instance.Status.Phase = "Creating"
                instance.Status.Message = "RDS instance provisioning has started."
                instance.Status.DBInstanceID = string(instance.UID)
                if err := r.Status().Update(ctx, instance); err != nil {
                    log.Error(err, "Failed to update CloudDatabase status")
                    return ctrl.Result{}, err
                }
    			return ctrl.Result{RequeueAfter: time.Minute * 1}, nil // Requeue to check status later.
    		}
    		// Some other AWS API error occurred.
    		log.Error(err, "Failed to describe RDS instance")
    		return ctrl.Result{}, err
    	}
    
    	// D. The instance exists. Check for drift and update if necessary.
    	if rdsInstance.Size != instance.Spec.Size {
    		log.Info("RDS instance size differs from spec. Updating.", "CurrentSize", rdsInstance.Size, "DesiredSize", instance.Spec.Size)
    		// ... logic to update RDS instance size ...
            // Asynchronous operation, requeue to monitor progress.
            return ctrl.Result{RequeueAfter: time.Minute * 2}, nil
    	}
    
        // E. Update status with current state from AWS.
        instance.Status.Phase = rdsInstance.Status
        instance.Status.Endpoint = rdsInstance.Endpoint
        if err := r.Status().Update(ctx, instance); err != nil {
            log.Error(err, "Failed to update CloudDatabase status")
            return ctrl.Result{}, err
        }
    
    	log.Info("Reconciliation complete. External resource is in desired state.")
    	return ctrl.Result{}, nil
    }

    Key Production Patterns Here:

  • Finalizer First: The very first action is to ensure the finalizer exists. We add it and immediately requeue. This prevents any other logic from running on an unprotected resource.
  • Idempotent Creation: We check if the resource exists before attempting to create it. We rely on a deterministic, unique identifier (like the CR's UID) to tag or name the external resource. This ensures that if the operator crashes after sending the Create request but before recording the result, a subsequent reconciliation will find the existing instance and not try to create a duplicate.
  • Status Subresource: We are careful to only update the .status subresource of our CR. This is critical as it prevents race conditions where our status update might overwrite a change made by a user to the .spec.
  • Asynchronous Operations: Cloud provider operations are rarely synchronous. We initiate an action (like CreateDBInstance), update our CR's status to reflect the Creating state, and then requeue with a delay (RequeueAfter). The next reconciliation will poll for the latest status.
  • State 2: Graceful Deletion Logic

    This is where the finalizer proves its worth. When deletionTimestamp is set, we execute our cleanup logic.

    go
    func (r *CloudDatabaseReconciler) reconcileDelete(ctx context.Context, instance *databasev1alpha1.CloudDatabase, log logr.Logger) (ctrl.Result, error) {
    	// Check if our finalizer is the one that's blocking deletion.
    	if controllerutil.ContainsFinalizer(instance, databasev1alpha1.CloudDatabaseFinalizer) {
    		log.Info("Performing cleanup for CloudDatabase")
    
    		// A. Call the external dependency to delete the resource.
    		if err := r.RDSClient.DeleteDBInstance(instance.UID); err != nil {
    			// Idempotency check: if the resource is already gone, that's success for us.
    			if errors.Is(err, rds.ErrDBInstanceNotFound) {
    				log.Info("External RDS instance already deleted. Proceeding to remove finalizer.")
    			} else {
                    // Another error occurred (e.g., API permissions, rate limiting).
    				log.Error(err, "Failed to delete RDS instance. Requeuing.")
                    // We must requeue to retry the deletion. The finalizer remains.
    				return ctrl.Result{}, err
    			}
    		}
    
            // B. (Optional but recommended) Poll to confirm deletion.
            // Some APIs return success immediately but deletion is async.
            // A robust operator confirms the resource is truly gone.
            isGone, err := r.RDSClient.ConfirmDBInstanceDeleted(instance.UID)
            if err != nil {
                log.Error(err, "Error during deletion confirmation polling.")
                return ctrl.Result{}, err
            }
            if !isGone {
                log.Info("RDS instance is still terminating. Requeuing to check again.")
                return ctrl.Result{RequeueAfter: time.Second * 30}, nil
            }
    
    		// C. Once external resource is gone, remove the finalizer.
    		log.Info("External resource deleted successfully. Removing finalizer.")
    		controllerutil.RemoveFinalizer(instance, databasev1alpha1.CloudDatabaseFinalizer)
    		if err := r.Update(ctx, instance); err != nil {
    			log.Error(err, "Failed to remove finalizer from CloudDatabase")
    			return ctrl.Result{}, err
    		}
    	}
    
    	// Finalizer is gone, or was never there. The object will be garbage collected.
    	log.Info("Reconciliation finished for deleted resource.")
    	return ctrl.Result{}, nil
    }

    Key Production Patterns Here:

  • Idempotent Deletion: The most critical aspect. If the operator crashes and restarts during deletion, this function will be called again. Our logic if errors.Is(err, rds.ErrDBInstanceNotFound) handles this perfectly. If the external resource is already gone, we treat it as a success and proceed.
  • Confirmation Polling: Simply calling a Delete API is often not enough. Many cloud services initiate a termination process that can take minutes. A robust operator should poll the external API to confirm the resource no longer exists before removing the finalizer. This prevents a race condition where the CR is deleted but the external resource termination fails later.
  • Error Handling: If deletion fails for any reason other than NotFound, we return an error. controller-runtime will automatically requeue the request with exponential backoff, preventing us from hammering a failing API.
  • Advanced Edge Cases and Production Considerations

    Building a simple finalizer loop is one thing; making it production-ready requires anticipating and handling complex failure modes.

    Finalizer Stalls

    What happens if the external API is permanently unavailable, or a bug in our code prevents the deletion logic from ever succeeding? The finalizer will remain, and the CR will be stuck in a Terminating state forever. This is a finalizer stall.

    Mitigation Strategies:

  • Metrics and Alerting: Your operator must expose metrics, such as the number of objects stuck in a terminating state for more than a configured threshold (e.g., 1 hour). A Prometheus query like sum(kube_resource_metadata_deletion_timestamp{resource="clouddatabases"}) can be a starting point for an alert.
  • Manual Intervention Documentation: Provide clear documentation for cluster administrators on how to manually remove a finalizer if necessary. This is a break-glass procedure but is sometimes required.
  • bash
        kubectl patch clouddatabase my-stuck-db --type json --patch='[{"op": "remove", "path": "/metadata/finalizers"}]'

    Administrators must understand this will likely orphan the external resource, which they will then need to clean up manually.

  • Automated Timeouts (Use with Extreme Caution): An operator could be programmed to give up after a certain period. For example, if a CR has been in a Terminating state for over 24 hours, the operator could log a critical error, emit a Kubernetes Event, and remove its own finalizer, consciously orphaning the resource to unblock the system. This is a design trade-off between guaranteed cleanup and system availability.
  • Concurrency and Leader Election

    In a production environment, you will run multiple replicas of your operator for high availability. controller-runtime handles leader election out of the box, ensuring only one pod is actively reconciling resources at any given time. This prevents two pods from simultaneously trying to create or delete the same RDS instance.

    However, your reconciliation logic must still be robust against failovers. If the leader pod dies mid-reconciliation, the new leader will pick up the exact same request. This is why idempotency is not just a feature but a fundamental requirement of the entire Reconcile function. Every step must be repeatable without causing unintended side effects.

    Performance and API Server Load

    This pattern introduces at least one extra UPDATE call to the Kubernetes API server for every CR created (to add the finalizer). For operators managing thousands of high-frequency resources, this can add load.

    Optimization Techniques:

    * Batching: While controller-runtime processes items individually, be mindful of the load your operator places on external APIs. Use client-side rate limiting (e.g., using a token bucket algorithm) when calling cloud providers.

    * Requeue Tuning: Be judicious with RequeueAfter. Polling every 5 seconds for a resource that takes 10 minutes to provision is wasteful. Use intelligent, longer requeue times. For some operations, you might be able to use an external eventing system (e.g., AWS EventBridge) to trigger reconciliation instead of polling, though this adds significant complexity.

    * Controller Caching: Understand how the controller-runtime cache works. By default, your reconciler reads from a local cache, which is eventually consistent with etcd. The Update and Get calls in our example (r.Update, r.Get) are to this cache. The write operations are sent to the API server, and the results are eventually reflected back in the cache. In rare cases of cache lag, a reconciliation might run with slightly stale data, another reason why idempotency is paramount.

    Complete Code Example: Tying It All Together

    Here is a more complete, runnable Reconcile function that demonstrates the full pattern.

    go
    package controller
    
    import (
    	"context"
    	"time"
    
    	"github.com/go-logr/logr"
    	"k8s.io/apimachinery/pkg/api/errors"
    	"k8s.io/apimachinery/pkg/runtime"
    	ctrl "sigs.k8s.io/controller-runtime"
    	"sigs.k8s.io/controller-runtime/pkg/client"
    	"sigs.k8s.io/controller-runtime/pkg/controller/controllerutil"
    
    	databasev1alpha1 "github.com/your-org/your-operator/api/v1alpha1"
    	"github.com/your-org/your-operator/internal/rds" // Assume this is a mock client
    )
    
    // CloudDatabaseReconciler reconciles a CloudDatabase object
    type CloudDatabaseReconciler struct {
    	client.Client
    	Log       logr.Logger
    	Scheme    *runtime.Scheme
    	RDSClient rds.Client // Your interface for the external service
    }
    
    const cloudDatabaseFinalizer = "database.example.com/finalizer"
    
    func (r *CloudDatabaseReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    	log := r.Log.WithValues("clouddatabase", req.NamespacedName)
    
    	instance := &databasev1alpha1.CloudDatabase{}
    	if err := r.Get(ctx, req.NamespacedName, instance); err != nil {
    		if errors.IsNotFound(err) {
    			return ctrl.Result{}, nil
    		}
    		return ctrl.Result{}, err
    	}
    
    	// Examine DeletionTimestamp to determine if object is under deletion
    	if instance.ObjectMeta.DeletionTimestamp.IsZero() {
    		// The object is not being deleted, so if it does not have our finalizer,
    		// then lets add the finalizer and update the object.
    		if !controllerutil.ContainsFinalizer(instance, cloudDatabaseFinalizer) {
    			log.Info("Adding finalizer")
    			controllerutil.AddFinalizer(instance, cloudDatabaseFinalizer)
    			if err := r.Update(ctx, instance); err != nil {
    				return ctrl.Result{}, err
    			}
    			return ctrl.Result{Requeue: true}, nil
    		}
    	} else {
    		// The object is being deleted
    		if controllerutil.ContainsFinalizer(instance, cloudDatabaseFinalizer) {
    			// Our finalizer is present, so lets handle any external dependency
    			log.Info("Handling external resource deletion")
    			if err := r.deleteExternalResources(instance); err != nil {
    				// if fail to delete the external dependency here, return with error
    				// so that it can be retried
    				return ctrl.Result{}, err
    			}
    
    			// Once external dependencies are cleaned up, remove the finalizer.
    			log.Info("Removing finalizer")
    			controllerutil.RemoveFinalizer(instance, cloudDatabaseFinalizer)
    			if err := r.Update(ctx, instance); err != nil {
    				return ctrl.Result{}, err
    			}
    		}
    		// Stop reconciliation as the item is being deleted
    		return ctrl.Result{}, nil
    	}
    
    	// Your normal reconciliation logic to create/update the external resource
    	log.Info("Reconciling CloudDatabase")
    	externalID := string(instance.UID)
    	rdsInstance, err := r.RDSClient.Get(externalID)
    	if err != nil {
    		if err == rds.ErrNotFound {
    			log.Info("Creating external resource")
    			if err := r.RDSClient.Create(externalID, instance.Spec.Size); err != nil {
    				return ctrl.Result{}, err
    			}
    			return ctrl.Result{RequeueAfter: 30 * time.Second}, nil
    		} else {
    			return ctrl.Result{}, err
    		}
    	}
    
    	if rdsInstance.Size != instance.Spec.Size {
    		log.Info("Updating external resource")
    		if err := r.RDSClient.Update(externalID, instance.Spec.Size); err != nil {
    			return ctrl.Result{}, err
    		}
    	}
    
    	return ctrl.Result{}, nil
    }
    
    // deleteExternalResources handles the deletion of the AWS RDS instance.
    func (r *CloudDatabaseReconciler) deleteExternalResources(instance *databasev1alpha1.CloudDatabase) error {
    	externalID := string(instance.UID)
    	r.Log.Info("Deleting RDS instance", "ID", externalID)
    	err := r.RDSClient.Delete(externalID)
    	if err != nil && err != rds.ErrNotFound {
    		return err
    	}
    	r.Log.Info("Successfully deleted or confirmed deletion of RDS instance", "ID", externalID)
    	return nil
    }
    
    // SetupWithManager sets up the controller with the Manager.
    func (r *CloudDatabaseReconciler) SetupWithManager(mgr ctrl.Manager) error {
    	return ctrl.NewControllerManagedBy(mgr).
    		For(&databasev1alpha1.CloudDatabase{}).
    		Complete(r)
    }
    

    By rigorously applying the finalizer pattern and building for idempotency, you can elevate your Kubernetes operator from a simple automation tool to a resilient, production-grade controller that reliably manages the full lifecycle of its resources, preventing costly leaks and ensuring system stability.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles