Advanced Kubernetes Finalizers for Stateful Resource Management

18 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Orphaned Resource Problem in Stateful Operators

In the world of Kubernetes operators, managing the lifecycle of a Custom Resource (CR) is straightforward as long as its entire state is confined within the cluster. However, the moment an operator needs to manage a resource outside of Kubernetes—an S3 bucket, a Cloud SQL database, a DNS record in Route 53, or even a user in an external SaaS application—the standard deletion mechanism becomes dangerously insufficient.

A kubectl delete my-cr my-instance command triggers a simple, asynchronous process. The API server validates the request, and the object is removed from etcd. For the Kubernetes garbage collector, the job is done. But what about the S3 bucket your operator provisioned? It's now an orphaned resource: untracked, potentially incurring costs, and a source of future configuration drift and security vulnerabilities.

This fundamental gap exists because the Kubernetes control plane has no innate knowledge of the external dependencies your controller has created. Standard ownerReferences work for in-cluster garbage collection but are useless for anything beyond the API server's reach.

The core of the issue lies in the final step of deletion. Kubernetes needs a mechanism to pause an object's final removal, effectively telling the responsible controller, "I intend to delete this object, but I will wait for you to perform your cleanup tasks first." This mechanism is the finalizer.


Finalizers: A Controller-Driven Pre-Delete Hook

A finalizer is not a piece of executable code or a complex API object. At its core, it's deceptively simple: metadata.finalizers is a list of strings on any Kubernetes object.

yaml
apiVersion: "my-operator.dev/v1alpha1"
kind: S3Bucket
metadata:
  name: my-production-bucket
  finalizers:
  - s3bucket.my-operator.dev/finalizer

Their power comes from a special rule enforced by the Kubernetes API server: an object with a non-empty finalizers list cannot be fully deleted from etcd.

When a user initiates a deletion on such an object, the API server doesn't remove it. Instead, it performs two critical actions:

  • It sets the metadata.deletionTimestamp field to the current time.
    • It updates the object's state, triggering a reconciliation event for any watching controllers.

    The object now exists in a special "terminating" state. It's still visible via the API (e.g., kubectl get), but it's marked for death. It is now the sole responsibility of the controller that added the finalizer to:

  • Detect the deletionTimestamp.
    • Perform the necessary external cleanup logic.
  • Remove its finalizer string from the metadata.finalizers list.
    • Update the object in the API server.

    Once the finalizers list is empty, the API server's garbage collector is free to complete the deletion and remove the object from etcd for good. Finalizers are, therefore, a cooperative mechanism—a contract between your controller and the API server to ensure graceful teardown.


    Implementing the Reconciliation Loop with Finalizer Logic

    Let's move from theory to a production-grade implementation using Go and the popular controller-runtime library, the foundation of Kubebuilder and Operator SDK. The Reconcile function is the heart of any controller, and its structure must be carefully designed to handle both the normal lifecycle and the deletion path.

    We will build an operator for a simple S3Bucket CRD. The logic can be broken down into two main branches.

    go
    import (
        "context"
    
        "k8s.io/apimachinery/pkg/runtime"
        ctrl "sigs.k8s.io/controller-runtime"
        "sigs.k8s.io/controller-runtime/pkg/client"
        "sigs.k8s.io/controller-runtime/pkg/log"
        "sigs.k8s.io/controller-runtime/pkg/controller/controllerutil"
    
        myoperatorv1alpha1 "my-operator.dev/api/v1alpha1"
    )
    
    const s3BucketFinalizer = "s3bucket.my-operator.dev/finalizer"
    
    type S3BucketReconciler struct {
        client.Client
        Scheme *runtime.Scheme
        // A mock or real S3 client
        S3Client S3ClientInterface 
    }
    
    func (r *S3BucketReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
        logger := log.FromContext(ctx)
    
        // 1. Fetch the S3Bucket instance
        s3Bucket := &myoperatorv1alpha1.S3Bucket{}
        if err := r.Get(ctx, req.NamespacedName, s3Bucket); err != nil {
            // Handle not-found errors, which can occur after deletion
            return ctrl.Result{}, client.IgnoreNotFound(err)
        }
    
        // 2. Check if the object is being deleted
        if !s3Bucket.ObjectMeta.DeletionTimestamp.IsZero() {
            // The object is in the process of being deleted
            if controllerutil.ContainsFinalizer(s3Bucket, s3BucketFinalizer) {
                // Our finalizer is present, so let's handle external dependency cleanup
                logger.Info("Performing finalizer cleanup for S3Bucket")
                if err := r.cleanupExternalResources(ctx, s3Bucket); err != nil {
                    // If cleanup fails, return an error to requeue the request
                    // The finalizer will not be removed, preventing deletion
                    logger.Error(err, "Failed to cleanup external resources")
                    return ctrl.Result{}, err
                }
    
                // Cleanup was successful, remove our finalizer
                logger.Info("External resources cleaned up, removing finalizer")
                controllerutil.RemoveFinalizer(s3Bucket, s3BucketFinalizer)
                if err := r.Update(ctx, s3Bucket); err != nil {
                    return ctrl.Result{}, err
                }
            }
            // Stop reconciliation as the item is being deleted
            return ctrl.Result{}, nil
        }
    
        // 3. The object is not being deleted, so add the finalizer if it doesn't exist
        if !controllerutil.ContainsFinalizer(s3Bucket, s3BucketFinalizer) {
            logger.Info("Adding finalizer for S3Bucket")
            controllerutil.AddFinalizer(s3Bucket, s3BucketFinalizer)
            if err := r.Update(ctx, s3Bucket); err != nil {
                return ctrl.Result{}, err
            }
        }
    
        // 4. This is the main reconciliation logic for creating/updating the resource
        logger.Info("Reconciling S3Bucket")
        err := r.reconcileExternalResources(ctx, s3Bucket)
        if err != nil {
            // Handle errors during creation/update
            return ctrl.Result{}, err
        }
    
        return ctrl.Result{}, nil
    }
    
    // Dummy functions for illustration
    func (r *S3BucketReconciler) cleanupExternalResources(ctx context.Context, bucket *myoperatorv1alpha1.S3Bucket) error {
        // Implementation in the next section
        return nil
    }
    
    func (r *S3BucketReconciler) reconcileExternalResources(ctx context.Context, bucket *myoperatorv1alpha1.S3Bucket) error {
        // Implementation in the next section
        return nil
    }

    Let's dissect this logic:

  • Fetch the Instance: Standard controller boilerplate.
  • The Deletion Branch (!s3Bucket.ObjectMeta.DeletionTimestamp.IsZero()): This is the critical check. If deletionTimestamp is set, we know a kubectl delete has been issued.
  • Check for Our* Finalizer: We use controllerutil.ContainsFinalizer to ensure we only act if our specific finalizer is present. This is crucial for interoperability if other controllers also manage this object.

    * Execute Cleanup: We call our external resource cleanup logic.

    * Handle Cleanup Failure: If cleanupExternalResources returns an error, we immediately return ctrl.Result{}, err. This tells controller-runtime to requeue the reconciliation request, typically with exponential backoff. The finalizer remains, and the object stays in the terminating state until the cleanup succeeds.

    * Remove Finalizer on Success: Only after the cleanup is confirmed successful do we call controllerutil.RemoveFinalizer and r.Update(). This is the signal to Kubernetes that our work is done.

  • The Normal Reconciliation Branch: If the object is not being deleted, we ensure our finalizer is present.
  • * Add Finalizer: If it's missing (e.g., on first creation), we add it using controllerutil.AddFinalizer and update the object. This early return is important; adding the finalizer triggers a new reconciliation, ensuring we operate on the object with the finalizer present in the next loop.

  • Main Logic: If the finalizer is present and the object is not being deleted, we proceed with the normal business logic of creating or updating the external resource.

  • Production-Grade Example: An S3 Bucket Operator

    Let's flesh out the S3Bucket operator with more realistic external resource handling logic. First, our CRD definition:

    yaml
    # config/crd/bases/my-operator.dev_s3buckets.yaml
    apiVersion: apiextensions.k8s.io/v1
    kind: CustomResourceDefinition
    metadata:
      name: s3buckets.my-operator.dev
    spec:
      group: my-operator.dev
      names:
        kind: S3Bucket
        listKind: S3BucketList
        plural: s3buckets
        singular: s3bucket
      scope: Namespaced
      versions:
        - name: v1alpha1
          served: true
          storage: true
          schema:
            openAPIV3Schema:
              type: object
              properties:
                spec:
                  type: object
                  properties:
                    bucketName:
                      type: string
                    region:
                      type: string
                status:
                  type: object
                  properties:
                    url:
                      type: string
                    phase:
                      type: string

    Now, let's implement the reconcileExternalResources and cleanupExternalResources methods using a mock AWS S3 client interface for clarity. In a real-world scenario, this would be the actual AWS Go SDK v2.

    go
    // s3client_interface.go
    package main
    
    import "context"
    
    // S3ClientInterface allows for mocking the S3 client in tests.
    type S3ClientInterface interface {
        CreateBucket(ctx context.Context, bucketName, region string) (string, error)
        DeleteBucket(ctx context.Context, bucketName string) error
        BucketExists(ctx context.Context, bucketName string) (bool, error)
    }
    
    // --- In reconciler.go ---
    
    func (r *S3BucketReconciler) reconcileExternalResources(ctx context.Context, s3Bucket *myoperatorv1alpha1.S3Bucket) error {
        logger := log.FromContext(ctx)
    
        exists, err := r.S3Client.BucketExists(ctx, s3Bucket.Spec.BucketName)
        if err != nil {
            logger.Error(err, "Failed to check if S3 bucket exists")
            return err
        }
    
        if !exists {
            logger.Info("S3 bucket does not exist, creating it", "BucketName", s3Bucket.Spec.BucketName)
            bucketURL, err := r.S3Client.CreateBucket(ctx, s3Bucket.Spec.BucketName, s3Bucket.Spec.Region)
            if err != nil {
                logger.Error(err, "Failed to create S3 bucket")
                // Update status to reflect failure
                s3Bucket.Status.Phase = "Failed"
                _ = r.Status().Update(ctx, s3Bucket)
                return err
            }
            s3Bucket.Status.URL = bucketURL
            s3Bucket.Status.Phase = "Created"
        } else {
            logger.Info("S3 bucket already exists, skipping creation")
            s3Bucket.Status.Phase = "Ready"
        }
    
        // Always update the status at the end of a successful reconciliation
        if err := r.Status().Update(ctx, s3Bucket); err != nil {
            logger.Error(err, "Failed to update S3Bucket status")
            return err
        }
    
        return nil
    }
    
    func (r *S3BucketReconciler) cleanupExternalResources(ctx context.Context, s3Bucket *myoperatorv1alpha1.S3Bucket) error {
        logger := log.FromContext(ctx)
    
        logger.Info("Deleting external S3 bucket", "BucketName", s3Bucket.Spec.BucketName)
        err := r.S3Client.DeleteBucket(ctx, s3Bucket.Spec.BucketName)
        
        // This is the key part for idempotency.
        // If the bucket is already gone, AWS SDK might return a 'NoSuchBucket' error.
        // We must treat this as a success for the cleanup operation.
        if err != nil {
            // Use a helper to check for the specific error code from the cloud provider
            if IsAwsNoSuchBucketError(err) {
                logger.Info("S3 bucket already deleted, cleanup is successful")
                return nil
            }
            logger.Error(err, "Failed to delete S3 bucket")
            return err
        }
    
        logger.Info("Successfully deleted S3 bucket")
        return nil
    }
    
    // IsAwsNoSuchBucketError is a placeholder for actual error type checking
    func IsAwsNoSuchBucketError(err error) bool {
        // In a real implementation, you would check the AWS error code:
        // var aerr awserr.Error
        // if errors.As(err, &aerr) {
        //     return aerr.Code() == "NoSuchBucket"
        // }
        // return false
        return false // Simplified for this example
    }

    The most important detail in cleanupExternalResources is handling the case where the resource is already gone. Cleanup logic must be idempotent. If a previous reconciliation attempt failed midway after deleting the bucket but before removing the finalizer, the next attempt must not fail simply because the bucket no longer exists. Checking for a NoSuchBucket error and treating it as success is a canonical example of this pattern.


    Advanced Edge Cases and Error Handling

    Production systems are defined by how they handle failure. A simple finalizer implementation will break under common real-world conditions.

    Case 1: External Resource Cleanup Fails Persistently

    Imagine the S3 bucket has a deletion policy that prevents its removal, or the IAM credentials used by the operator lack s3:DeleteBucket permissions. The r.S3Client.DeleteBucket call will fail every time.

    Behavior:

    * Our controller will return an error from the Reconcile function.

    * controller-runtime will requeue the object with exponential backoff (e.g., 1s, 2s, 4s, 8s...).

    * The finalizer will never be removed.

    * The S3Bucket object will be permanently stuck in the Terminating state.

    Solution: This is the correct behavior. The finalizer is doing its job: preventing the Kubernetes object from being deleted while its real-world counterpart still exists. The problem is not with the operator but with the external system's configuration. This state signals to a human operator that manual intervention is required. They must either fix the IAM permissions or resolve the bucket policy issue. Once fixed, the next reconciliation attempt will succeed, and the deletion will complete.

    Case 2: The Operator is Down During Deletion

    An operator pod might be evicted, crash, or be taken down for maintenance. What happens if a user deletes a CR while the operator is offline?

    Behavior:

    * The user runs kubectl delete.

    * The API server sets the deletionTimestamp.

    * Since the operator is not running, no reconciliation occurs.

    * The S3Bucket object remains in the Terminating state indefinitely.

    Solution: This is also a feature, not a bug. The state is durably stored in etcd. As soon as the operator pod is rescheduled and starts running again, its informer will sync, and it will receive an event for the S3Bucket object. It will immediately see the deletionTimestamp and begin its cleanup process. The finalizer guarantees that the deletion intent is not lost, even across operator restarts.

    Case 3: The Stuck Finalizer and Manual Intervention

    Sometimes, a bug in the controller or an unrecoverable external state (e.g., the cloud provider's API is down for an extended period) can lead to a truly stuck finalizer. An administrator may decide that the orphaned resource is acceptable and that the Kubernetes object must be deleted.

    Problem: The object cannot be deleted while the finalizer is present.

    Solution (for Cluster Administrators): The finalizer can be manually removed by patching the object. This is a break-glass procedure and should be used with extreme caution, as it will almost certainly lead to an orphaned external resource.

    bash
    # Find the stuck object
    $ kubectl get s3buckets -n my-namespace
    NAME                   PHASE         AGE
    my-production-bucket   Terminating   2d
    
    # Manually remove the finalizer by patching it to an empty list
    $ kubectl patch s3bucket my-production-bucket -n my-namespace --type=merge -p '{"metadata":{"finalizers":[]}}'
    s3bucket.my-operator.dev/my-production-bucket patched

    Immediately after the patch is applied, the API server will see the object's finalizers list is empty and its deletionTimestamp is set, and it will proceed with the final garbage collection.

    Case 4: Handling Multiple Cooperating Finalizers

    The finalizers field is a slice of strings, not a single string. This allows multiple controllers to manage the same object. For example, one controller might manage the S3 bucket itself, while another controller adds a finalizer to ensure that DNS records pointing to the bucket are removed upon deletion.

    yaml
    apiVersion: "my-operator.dev/v1alpha1"
    kind: S3Bucket
    metadata:
      name: my-production-bucket
      finalizers:
      - s3bucket.my-operator.dev/finalizer  # Our operator's finalizer
      - dns.cleanup-operator.dev/finalizer  # Another operator's finalizer

    Problem: If your cleanup logic naively sets s3Bucket.ObjectMeta.Finalizers = [], you will incorrectly remove the other operator's finalizer, breaking its contract.

    Solution: Always use the controllerutil helpers, as they correctly manipulate the slice. controllerutil.RemoveFinalizer will only remove the specified string, leaving others intact. Our example code already does this correctly, highlighting the importance of using established library functions over manual slice manipulation.


    Performance and Idempotency Considerations

    * Idempotency is Non-Negotiable: As shown with the NoSuchBucket error, every step of your reconciliation, both for creation and deletion, must be idempotent. If a Reconcile loop is aborted halfway through and retried, it must produce the same result. Always check if a resource exists before creating it, and always treat "not found" as success during deletion.

    * Requeue Strategy: Differentiate between returning an error and returning ctrl.Result{Requeue: true}.

    * return ctrl.Result{}, err: Use this for transient or unexpected failures (e.g., API call failed, network issue). This leverages the controller manager's built-in exponential backoff, which is ideal for retrying operations against external systems.

    * return ctrl.Result{RequeueAfter: time.Minute}: Use this when you are waiting for a long-running external process to complete and want to check its status periodically. This is more of a polling mechanism and should be used judiciously to avoid overwhelming the API server.

    * Controller Concurrency: The MaxConcurrentReconciles option in the controller manager (mgr.NewControllerManagedBy(mgr).For(...).WithOptions(...)) determines how many Reconcile goroutines can run in parallel. When deleting hundreds of CRs at once, each cleanup operation might involve a slow API call. A low concurrency setting will process deletions slowly, while a high setting could lead to rate limiting from the external cloud provider. This value must be tuned based on the latency of your external dependencies and any API quotas.

    Conclusion

    Finalizers are not just a feature; they are the cornerstone of any Kubernetes operator that manages stateful resources outside the cluster. They transform the operator from a simple resource provisioner into a true lifecycle manager. By correctly implementing the finalizer pattern, you provide a seamless, declarative experience for users, ensuring that a kubectl delete command results in a complete and graceful teardown of all associated infrastructure, preventing the costly and dangerous problem of orphaned resources. The complexity lies not in the finalizer itself, but in the robust, idempotent, and error-aware reconciliation logic that it enables.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles