Idempotent K8s Finalizers for Stateful Resource Deletion

17 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Deletion State Problem in Kubernetes Operators

As a senior engineer building on Kubernetes, you've likely moved beyond simple stateless applications and are now in the domain of operators and custom controllers. The core of the operator pattern is the reconciliation loop: a continuous process that drives the current state of the world toward a desired state defined in a Custom Resource (CR). While creating and updating resources is straightforward, handling deletion is a fundamentally different and more complex problem.

Consider an operator that manages a ManagedDatabase CR. When a developer applies a ManagedDatabase manifest, the operator might provision a database instance in a cloud provider like AWS RDS. The manifest is the source of truth, and the RDS instance is the external, managed resource.

yaml
apiVersion: db.example.com/v1alpha1
kind: ManagedDatabase
metadata:
  name: user-service-db
spec:
  engine: postgres
  version: "14.5"
  storageGB: 20

The problem arises when a developer runs kubectl delete manageddatabase user-service-db. The Kubernetes API server receives this request and immediately removes the ManagedDatabase object from etcd. Your controller, which is watching for changes to ManagedDatabase objects, sees a 'delete' event. But by the time it can react, the object containing all the necessary information (like the RDS instance ID stored in its status) is already gone. The reconciliation loop for that specific resource instance will not be triggered again because the resource no longer exists.

This results in an orphaned resource: the RDS instance continues to run, incurring costs and becoming a security liability, completely disconnected from the Kubernetes control plane that was supposed to manage it. A simple check in your Reconcile function like if errors.IsNotFound(err) is insufficient because the function won't even be invoked for a resource that has been fully deleted from the API server.

This is where the concept of Finalizers becomes not just a best practice, but an absolute necessity for building robust, production-grade operators that manage any stateful external resource.

Finalizers: The Kubernetes Pre-Deletion Hook

A finalizer is simply a string key added to the metadata.finalizers list of any Kubernetes object. When a finalizer is present on an object, it acts as a locking mechanism that prevents the object from being physically deleted from etcd.

Here's the detailed lifecycle of a deletion request for an object with a finalizer:

  • Deletion Request: A user or another process executes kubectl delete or sends a DELETE request to the API server for the object.
  • API Server Interception: The API server inspects the object. It sees that the metadata.finalizers array is not empty.
  • Graceful Deletion State: Instead of deleting the object, the API server sets the metadata.deletionTimestamp field to the current time. The object remains in the API server but is now in a read-only, "terminating" state. Any attempts to update the object's spec will fail, though metadata (like finalizers) and status can still be modified.
  • Controller Reconciliation: The act of setting the deletionTimestamp is an 'update' event. This triggers a reconciliation for the object in your controller. Your Reconcile function is now invoked.
  • Cleanup Logic Execution: Inside your Reconcile function, you must now explicitly check if the object is being deleted. The canonical way is if !object.GetDeletionTimestamp().IsZero(). This is your signal to execute all necessary cleanup logic—deleting the RDS instance, removing a DNS record, de-provisioning a storage volume, etc.
  • Finalizer Removal: Once your cleanup logic has completed successfully and idempotently, your controller's final responsibility is to remove its specific finalizer from the metadata.finalizers list and update the object in the API server.
  • Final Deletion: The API server observes that the object it intended to delete now has an empty finalizers list and a non-nil deletionTimestamp. This condition signals that all pre-deletion hooks are complete, and the API server proceeds to permanently delete the object from etcd.
  • This two-phase deletion process transforms a fire-and-forget delete operation into a coordinated, graceful shutdown, giving your controller the time and context it needs to clean up external resources properly.

    Core Implementation Pattern with `controller-runtime`

    Let's build a practical, production-ready implementation using Go and controller-runtime, the de-facto standard for building operators. We'll continue with our ManagedDatabase example.

    First, we define a unique name for our finalizer. It's a best practice to use a domain-qualified name to avoid collisions with other controllers that might operate on the same object.

    go
    // In controllers/manageddatabase_controller.go
    const managedDatabaseFinalizer = "db.example.com/finalizer"

    The structure of our main Reconcile function will be a dispatcher that inspects the object's state and routes to the appropriate logic handler.

    go
    // In controllers/manageddatabase_controller.go
    
    import (
        "context"
        "time"
    
        "k8s.io/apimachinery/pkg/runtime"
        "k8s.io/apimachinery/pkg/api/errors"
        ctrl "sigs.k8s.io/controller-runtime"
        "sigs.k8s.io/controller-runtime/pkg/client"
        "sigs.k8s.io/controller-runtime/pkg/log"
        "sigs.k8s.io/controller-runtime/pkg/controller/controllerutil"
    
        dbv1alpha1 "github.com/your-repo/managed-db-operator/api/v1alpha1"
    )
    
    func (r *ManagedDatabaseReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
        logger := log.FromContext(ctx)
    
        // 1. Fetch the ManagedDatabase instance
        dbInstance := &dbv1alpha1.ManagedDatabase{}
        err := r.Get(ctx, req.NamespacedName, dbInstance)
        if err != nil {
            if errors.IsNotFound(err) {
                // Object not found, probably deleted. No action needed.
                logger.Info("ManagedDatabase resource not found. Ignoring since object must be deleted.")
                return ctrl.Result{}, nil
            }
            logger.Error(err, "Failed to get ManagedDatabase")
            return ctrl.Result{}, err
        }
    
        // 2. Check if the instance is marked for deletion
        isMarkedForDeletion := dbInstance.GetDeletionTimestamp() != nil
        if isMarkedForDeletion {
            if controllerutil.ContainsFinalizer(dbInstance, managedDatabaseFinalizer) {
                // Deletion logic
                return r.reconcileDelete(ctx, dbInstance)
            }
            // Finalizer already removed. Nothing to do.
            return ctrl.Result{}, nil
        }
    
        // 3. Add finalizer if it doesn't exist
        if !controllerutil.ContainsFinalizer(dbInstance, managedDatabaseFinalizer) {
            logger.Info("Adding finalizer for ManagedDatabase")
            controllerutil.AddFinalizer(dbInstance, managedDatabaseFinalizer)
            if err := r.Update(ctx, dbInstance); err != nil {
                return ctrl.Result{}, err
            }
            // A reconciliation will be triggered by the update, so we can return here.
            return ctrl.Result{}, nil
        }
    
        // 4. Run normal reconciliation logic
        return r.reconcileNormal(ctx, dbInstance)
    }

    Let's break down the handler functions.

    `reconcileDelete`: The Idempotent Cleanup Handler

    This function is the heart of our graceful deletion logic. It's responsible for interacting with the external system (e.g., the cloud provider's API) to tear down the resource. The most critical aspect of this function is idempotency.

    go
    // Mock external client for demonstration
    type MockDBProviderClient struct {}
    
    func (c *MockDBProviderClient) GetDatabaseStatus(instanceID string) (string, error) { /* ... */ return "DELETING", nil }
    func (c *MockDBProviderClient) DeleteDatabase(instanceID string) error { /* ... */ return nil }
    
    func (r *ManagedDatabaseReconciler) reconcileDelete(ctx context.Context, db *dbv1alpha1.ManagedDatabase) (ctrl.Result, error) {
        logger := log.FromContext(ctx)
        logger.Info("Starting deletion reconciliation for ManagedDatabase")
    
        // External resource ID should be stored in the status
        externalID := db.Status.InstanceID
        if externalID == "" {
            // No external resource was ever created, or status was not updated.
            // We can safely remove the finalizer.
            logger.Info("No external instance ID found. Removing finalizer.")
            controllerutil.RemoveFinalizer(db, managedDatabaseFinalizer)
            return ctrl.Result{}, r.Update(ctx, db)
        }
    
        // --- IDEMPOTENCY CHECK --- 
        // Check if the external resource still exists.
        // This is crucial because a previous reconciliation might have failed after deleting the DB but before removing the finalizer.
        status, err := r.DBProviderClient.GetDatabaseStatus(externalID)
        if err != nil {
            // Handle specific 'NotFound' errors from the provider API
            if IsProviderResourceNotFound(err) {
                logger.Info("External database already deleted. Removing finalizer.")
                controllerutil.RemoveFinalizer(db, managedDatabaseFinalizer)
                return ctrl.Result{}, r.Update(ctx, db)
            }
            // Any other error means we can't confirm the state, so we must retry.
            logger.Error(err, "Failed to get external database status during deletion")
            return ctrl.Result{}, err
        }
    
        // If resource is already being deleted by the provider, we just wait.
        if status == "DELETING" {
            logger.Info("External database is already being deleted. Requeuing for status check.")
            return ctrl.Result{RequeueAfter: 30 * time.Second}, nil
        }
    
        // --- EXECUTE DELETION --- 
        logger.Info("Deleting external database instance", "InstanceID", externalID)
        if err := r.DBProviderClient.DeleteDatabase(externalID); err != nil {
            // If deletion fails, we return an error to trigger exponential backoff and retry.
            logger.Error(err, "Failed to delete external database instance")
            return ctrl.Result{}, err
        }
    
        // --- FINALIZER REMOVAL --- 
        // Once deletion is successfully initiated (or confirmed), remove the finalizer.
        logger.Info("External database deletion initiated. Removing finalizer.")
        controllerutil.RemoveFinalizer(db, managedDatabaseFinalizer)
        if err := r.Update(ctx, db); err != nil {
            return ctrl.Result{}, err
        }
    
        return ctrl.Result{}, nil
    }

    The idempotency check is paramount. We don't just blindly call DeleteDatabase. We first check the status. If the resource is already gone (perhaps from a previous, partially failed reconcile), we simply proceed to remove the finalizer. If the deletion call fails, we return an error, and controller-runtime will requeue the request. The object remains in its terminating state until our logic succeeds.

    `reconcileNormal`: The Create/Update Handler

    This is the standard reconciliation logic that runs when the object is not being deleted. It ensures the external resource exists and matches the spec.

    go
    func (r *ManagedDatabaseReconciler) reconcileNormal(ctx context.Context, db *dbv1alpha1.ManagedDatabase) (ctrl.Result, error) {
        logger := log.FromContext(ctx)
        logger.Info("Starting normal reconciliation for ManagedDatabase")
    
        // If InstanceID is not set in status, the external resource likely doesn't exist.
        if db.Status.InstanceID == "" {
            logger.Info("No InstanceID found in status. Creating external database.")
            newInstanceID, err := r.DBProviderClient.CreateDatabase(db.Spec)
            if err != nil {
                logger.Error(err, "Failed to create external database")
                // Update status with a failure condition
                db.Status.Ready = false
                db.Status.Condition = "CreateFailed: " + err.Error()
                _ = r.Status().Update(ctx, db) // Best effort status update
                return ctrl.Result{}, err
            }
    
            // CRITICAL: Update the status with the new InstanceID immediately.
            // This is the link between the Kubernetes object and the external world.
            db.Status.InstanceID = newInstanceID
            db.Status.Ready = false // Still provisioning
            db.Status.Condition = "Provisioning"
            if err := r.Status().Update(ctx, db); err != nil {
                // If this status update fails, we have a problem. The next reconcile will try to create the DB again.
                // Your external CreateDatabase function MUST be idempotent to handle this.
                logger.Error(err, "Failed to update status with new InstanceID")
                return ctrl.Result{}, err
            }
    
            logger.Info("Successfully initiated database creation", "InstanceID", newInstanceID)
            return ctrl.Result{RequeueAfter: 1 * time.Minute}, nil // Requeue to check status later
        }
    
        // If we reach here, the InstanceID exists. We should check the status and sync the spec.
        // ... logic to check external DB status and update if spec has drifted ...
        // For example, check if `db.Spec.StorageGB` matches the actual allocated storage.
        
        // Finally, update status to Ready if everything is aligned.
        db.Status.Ready = true
        db.Status.Condition = "Ready"
        if err := r.Status().Update(ctx, db); err != nil {
            return ctrl.Result{}, err
        }
    
        return ctrl.Result{}, nil
    }

    Edge Cases and Production Considerations

    A basic implementation is a good start, but production systems are defined by how they handle failure. Let's analyze the critical edge cases.

    1. Controller Crashes and Restarts

    This is where the beauty of the finalizer pattern shines. The state of the deletion process (i.e., the presence of the deletionTimestamp and the finalizer itself) is stored durably in etcd as part of the object. It is not held in the controller's memory.

  • Scenario: The controller successfully calls the cloud provider to delete the database. Before it can remove the finalizer, the controller pod crashes.
  • Result: When the controller restarts, its informer cache will sync with the API server. It will see the ManagedDatabase object still exists with a deletionTimestamp and a finalizer. It will immediately trigger a reconcileDelete. Because our deletion logic is idempotent, it will first check the external DB's status. It will discover the DB is already gone or in a DELETING state, and will then proceed to safely remove the finalizer. The system self-heals without operator intervention.
  • 2. External API Failures and Retries

    Cloud APIs are not infallible. They experience downtime, rate limiting, and transient errors.

  • Scenario: The reconcileDelete function attempts to delete the external database, but the cloud provider's API returns a 503 Service Unavailable.
  • Result: Our DeleteDatabase client function should return this error. The reconcileDelete function, in turn, returns the error to the controller-runtime manager: return ctrl.Result{}, err. The manager will automatically requeue the reconciliation request for the object using an exponential backoff algorithm. This prevents the controller from hammering a failing API. The ManagedDatabase object will remain in its Terminating state until the external API is available again and the deletion call succeeds.
  • 3. Stuck Finalizers: The Admin's Nightmare

    What if there's a bug in your reconcileDelete logic that prevents it from ever succeeding? For example, it might be trying to access a field that doesn't exist, causing a panic, or it's stuck in a loop waiting for a status that will never occur.

  • Scenario: A bug prevents the finalizer from ever being removed. A user tries kubectl delete manageddatabase my-db, and the command hangs indefinitely. kubectl get manageddatabase my-db -o yaml shows the deletionTimestamp is set, but the object never disappears.
  • Result: The object is now stuck. It cannot be deleted via the standard API because the finalizer is blocking it. This is a common operational problem with faulty controllers.
  • As a cluster administrator, you have an escape hatch. You can manually patch the object to remove the finalizer. This is a dangerous operation as it bypasses the controller's cleanup logic and will almost certainly orphan the external resource.

    bash
    # Find the finalizer name
    kubectl get manageddatabase my-db -o jsonpath='{.metadata.finalizers}'
    # Expected output: ["db.example.com/finalizer"]
    
    # Manually remove the finalizer by patching the object with an empty list
    kubectl patch manageddatabase my-db --type='merge' -p '{"metadata":{"finalizers":[]}}'

    After this patch, the API server will see the empty finalizer list and proceed with the deletion. This should only be done after thoroughly investigating the controller logs and understanding why it's failing.

    Advanced Pattern: Multiple Coordinated Finalizers

    The finalizers field is a list, not a single string. This is by design, allowing multiple independent controllers to coordinate on the deletion of a single resource.

    Imagine a scenario where, in addition to our ManagedDatabase controller, we have a separate Monitoring controller. This controller watches ManagedDatabase objects and, upon creation, provisions a dashboard in Grafana and a set of alerts in Prometheus for that database.

    When the ManagedDatabase is deleted, we need to ensure both the RDS instance is deleted and the monitoring configuration is de-provisioned.

    This is achieved by each controller managing its own finalizer:

  • The ManagedDatabase controller adds the db.example.com/finalizer.
  • The Monitoring controller adds the monitoring.example.com/finalizer.
  • When kubectl delete is called, the object's deletionTimestamp is set. Now, both controllers will be triggered.

  • The ManagedDatabase controller will run its reconcileDelete logic, delete the RDS instance, and then remove only the db.example.com/finalizer.
  • The Monitoring controller will run its own reconcileDelete, remove the Grafana dashboard, and then remove only the monitoring.example.com/finalizer.
  • Kubernetes will only delete the object after the finalizers list is completely empty. This ensures that both independent cleanup processes have completed successfully before the source-of-truth object is removed.

    Your controller's logic must be written to be a good citizen and only manage its own finalizer:

    go
    // Inside reconcileDelete for the database controller
    logger.Info("Removing database finalizer")
    controllerutil.RemoveFinalizer(db, managedDatabaseFinalizer) // Does not touch other finalizers
    if err := r.Update(ctx, db); err != nil {
        return ctrl.Result{}, err
    }

    Conclusion: Finalizers as a Cornerstone of Reliable Operators

    Finalizers are not an optional feature or a minor optimization; they are the fundamental mechanism for building reliable Kubernetes operators that manage stateful resources. By shifting the deletion process from a single, atomic API call to a two-phase, state-driven reconciliation, Kubernetes provides the framework needed to handle the complexities of external resource management.

    Mastering this pattern requires a deep focus on idempotency. Every line of code in your reconciliation loop, especially the deletion path, must be repeatable and resilient to failure. You must assume your controller could be restarted at any point in the process and be able to pick up where it left off by reading the state from the Kubernetes object and the external system.

    For senior engineers, moving beyond simple, stateless controllers means embracing patterns like finalizers. It is the key to building production-grade, self-healing systems that can be trusted to manage critical infrastructure without leaving a trail of orphaned resources and operational debt.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles