Kubernetes Finalizers: Graceful CRD Deletion in Custom Operators

13 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Operator's Dilemma: Orphaned Resources on Deletion

As a seasoned Kubernetes engineer, you understand the power of the operator pattern. By extending the Kubernetes API with Custom Resource Definitions (CRDs), we can manage complex, stateful applications and external resources using the same declarative kubectl workflow we use for Pods and Deployments. However, this power comes with a critical responsibility: lifecycle management.

Consider an operator that manages S3Bucket custom resources. When a developer applies a manifest (kubectl apply -f my-bucket.yaml), the operator's reconciliation loop is triggered. It sees the new S3Bucket CR, calls the AWS API, and provisions the actual S3 bucket. The system is in its desired state.

But what happens when the developer runs kubectl delete s3bucket my-bucket? The Kubernetes API server marks the S3Bucket object for deletion and, by default, removes it. Your operator's controller, which was watching for events on this object, simply stops receiving them. The reconciliation loop for my-bucket no longer runs. The result? The S3Bucket CR is gone from Kubernetes, but the actual, costly S3 bucket remains active in your AWS account—an orphaned resource.

This is the fundamental problem that finalizers solve. They are a Kubernetes-native mechanism that allows a controller to hook into the pre-deletion lifecycle of an object, preventing its removal from the API server until specific cleanup logic has been successfully executed.

This article is not an introduction to operators. It assumes you are familiar with Go, controller-runtime, and the basic reconciliation loop. We will focus exclusively on the advanced, production-grade implementation of finalizers to build robust, self-healing operators that never leave resources behind.

The Deletion Lifecycle: Before and After Finalizers

To appreciate the role of a finalizer, let's first examine the flawed deletion process without one.

Standard Deletion: A Recipe for Leaks

A typical reconciliation loop in a controller-runtime based operator looks something like this:

go
// controllers/myresource_controller.go

func (r *MyResourceReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
	log := log.FromContext(ctx)

	// 1. Fetch the MyResource instance
	var myResource mygroup.v1.MyResource
	if err := r.Get(ctx, req.NamespacedName, &myResource); err != nil {
		if errors.IsNotFound(err) {
			// Object was deleted. The reconciliation loop for this object stops here.
			// No opportunity for cleanup!
			log.Info("MyResource resource not found. Ignoring since object must be deleted")
			return ctrl.Result{}, nil
		}
		log.Error(err, "Failed to get MyResource")
		return ctrl.Result{}, err
	}

	// 2. Main reconciliation logic: create/update the external resource
	log.Info("Reconciling MyResource", "Name", myResource.Name)
	// ... logic to create or update an external resource ...

	return ctrl.Result{}, nil
}

The key issue lies within the errors.IsNotFound(err) block. When a CR is deleted, the r.Get call fails with a NotFound error. The standard pattern is to log this and return an empty ctrl.Result{} with no error. This signals to controller-runtime that the work for this object is complete. The operator moves on, and the external resource it was managing is now an orphan.

The Finalizer-Aware Deletion Process

A finalizer is simply a string key added to an object's metadata.finalizers list. When you attempt to delete an object that has finalizers, the API server does something different. Instead of deleting the object immediately, it populates the metadata.deletionTimestamp field with the current time. The object remains in the API server, but is now in a "terminating" state.

This is the crucial hook. The update to metadata.deletionTimestamp triggers a new reconciliation event for your operator. Your reconciliation loop now runs with a new piece of information: the object is being deleted, but it's not gone yet.

The process becomes a two-phase commit for deletion:

  • Deletion Request: A user runs kubectl delete. The API server sees the finalizer list is not empty and sets metadata.deletionTimestamp.
  • Cleanup Reconciliation: Your operator's Reconcile function is triggered. It detects that deletionTimestamp is set.
  • Execute Cleanup: The operator performs its cleanup logic (e.g., calls the AWS API to delete the S3 bucket).
  • Remove Finalizer: Once cleanup is verifiably complete, the operator removes its finalizer string from the metadata.finalizers list and updates the object in the API server.
  • Final Deletion: The Kubernetes garbage collector, seeing an object with a deletionTimestamp and an empty finalizers list, permanently removes the object.
  • If the cleanup logic fails (step 3), the finalizer is not removed (step 4). The reconciliation will be re-triggered, and the operator will retry the cleanup until it succeeds. This guarantees that the external resource is removed before the Kubernetes resource that tracks it disappears.

    Production-Grade Implementation with `controller-runtime`

    Let's build a robust finalizer implementation for an operator that manages a MessageQueue CRD, which corresponds to an external (mocked) queueing service.

    Project Structure:

    text
    ├── api
    │   └── v1
    │       ├── groupversion_info.go
    │       ├── messagequeue_types.go
    │       └── zz_generated.deepcopy.go
    ├── config
    │   └── ...
    ├── controllers
    │   ├── messagequeue_controller.go
    │   └── suite_test.go
    ├── internal
    │   └── externalservice
    │       └── mock_queue_service.go
    └── main.go

    CRD Definition (api/v1/messagequeue_types.go):

    go
    // api/v1/messagequeue_types.go
    package v1
    
    import (
    	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
    )
    
    // MessageQueueSpec defines the desired state of MessageQueue
    type MessageQueueSpec struct {
    	// Name of the queue to be created in the external service.
    	QueueName string `json:"queueName"`
    	// Number of partitions for the queue.
    	Partitions int `json:"partitions,omitempty"`
    }
    
    // MessageQueueStatus defines the observed state of MessageQueue
    type MessageQueueStatus struct {
    	// The URL of the created queue.
    	QueueURL string `json:"queueURL,omitempty"`
    	// A simple status field indicating the state.
    	State string `json:"state,omitempty"`
    }
    
    //+kubebuilder:object:root=true
    //+kubebuilder:subresource:status
    
    // MessageQueue is the Schema for the messagequeues API
    type MessageQueue struct {
    	metav1.TypeMeta   `json:",inline"`
    	metav1.ObjectMeta `json:"metadata,omitempty"`
    
    	Spec   MessageQueueSpec   `json:"spec,omitempty"`
    	Status MessageQueueStatus `json:"status,omitempty"`
    }
    
    //+kubebuilder:object:root=true
    
    // MessageQueueList contains a list of MessageQueue
    type MessageQueueList struct {
    	metav1.TypeMeta `json:",inline"`
    	metav1.ListMeta `json:"metadata,omitempty"`
    	Items           []MessageQueue `json:"items"`
    }
    
    func init() {
    	SchemeBuilder.Register(&MessageQueue{}, &MessageQueueList{})
    }

    The Reconciler (controllers/messagequeue_controller.go):

    This is where the core logic resides. We'll define a finalizer name and structure our Reconcile function to handle the two distinct states: normal reconciliation and deletion reconciliation.

    go
    // controllers/messagequeue_controller.go
    package controllers
    
    import (
    	"context"
    	"time"
    
    	"k8s.io/apimachinery/pkg/runtime"
    	ctrl "sigs.k8s.io/controller-runtime"
    	"sigs.k8s.io/controller-runtime/pkg/client"
    	"sigs.k8s.io/controller-runtime/pkg/controller/controllerutil"
    	"sigs.k8s.io/controller-runtime/pkg/log"
    
    	cachev1 "my-operator/api/v1"
    	"my-operator/internal/externalservice"
    )
    
    // Define the finalizer name
    const messageQueueFinalizer = "cache.my.domain/finalizer"
    
    // MessageQueueReconciler reconciles a MessageQueue object
    type MessageQueueReconciler struct {
    	client.Client
    	Scheme *runtime.Scheme
    	// A mock client for our external service
    	QueueService *externalservice.MockQueueService
    }
    
    //+kubebuilder:rbac:groups=cache.my.domain,resources=messagequeues,verbs=get;list;watch;create;update;patch;delete
    //+kubebuilder:rbac:groups=cache.my.domain,resources=messagequeues/status,verbs=get;update;patch
    //+kubebuilder:rbac:groups=cache.my.domain,resources=messagequeues/finalizers,verbs=update
    
    func (r *MessageQueueReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    	logger := log.FromContext(ctx)
    
    	// 1. Fetch the MessageQueue instance
    	instance := &cachev1.MessageQueue{}
    	if err := r.Get(ctx, req.NamespacedName, instance); err != nil {
    		if client.IgnoreNotFound(err) != nil {
    			logger.Error(err, "unable to fetch MessageQueue")
    			return ctrl.Result{}, err
    		}
    		logger.Info("MessageQueue resource not found. Ignoring since object must be deleted.")
    		return ctrl.Result{}, nil
    	}
    
    	// 2. Examine the deletion timestamp to determine if the object is under deletion.
    	if instance.ObjectMeta.DeletionTimestamp.IsZero() {
    		// The object is not being deleted, so we add our finalizer if it doesn't exist.
    		if !controllerutil.ContainsFinalizer(instance, messageQueueFinalizer) {
    			logger.Info("Adding Finalizer for the MessageQueue")
    			controllerutil.AddFinalizer(instance, messageQueueFinalizer)
    			if err := r.Update(ctx, instance); err != nil {
    				logger.Error(err, "Failed to update MessageQueue with finalizer")
    				return ctrl.Result{}, err
    			}
    		}
    	} else {
    		// The object is being deleted.
    		if controllerutil.ContainsFinalizer(instance, messageQueueFinalizer) {
    			logger.Info("Performing Finalizer Operations for MessageQueue before deletion")
    
    			// Perform our cleanup logic. 
                // This is the most important part of the finalizer.
    			if err := r.cleanupExternalResources(instance); err != nil {
    				// If cleanup fails, we don't remove the finalizer. 
                    // The reconciliation will be retried.
    				logger.Error(err, "Failed to cleanup external resources")
    				return ctrl.Result{}, err
    			}
    
    			// Cleanup was successful. Remove the finalizer.
    			logger.Info("Removing Finalizer for MessageQueue after successful cleanup")
    			controllerutil.RemoveFinalizer(instance, messageQueueFinalizer)
    			if err := r.Update(ctx, instance); err != nil {
    				logger.Error(err, "Failed to remove finalizer from MessageQueue")
    				return ctrl.Result{}, err
    			}
    		}
    		// Stop reconciliation as the item is being deleted
    		return ctrl.Result{}, nil
    	}
    
    	// 3. Main reconciliation logic: Create or update the external queue
    	queueURL, err := r.QueueService.GetQueue(instance.Spec.QueueName)
    	if err != nil {
    		if err == externalservice.ErrQueueNotFound {
    			logger.Info("Queue not found, creating a new one.")
    			newQueueURL, createErr := r.QueueService.CreateQueue(instance.Spec.QueueName, instance.Spec.Partitions)
    			if createErr != nil {
    				logger.Error(createErr, "Failed to create external queue")
    				// Update status to reflect failure
    				instance.Status.State = "ErrorCreating"
    				r.Status().Update(ctx, instance)
    				return ctrl.Result{}, createErr
    			}
    			instance.Status.QueueURL = newQueueURL
    			instance.Status.State = "Created"
    		} else {
    			logger.Error(err, "Failed to get queue status from external service")
    			return ctrl.Result{}, err
    		}
    	} else {
    		logger.Info("Queue already exists, ensuring state is correct.")
    		instance.Status.QueueURL = queueURL
    		instance.Status.State = "Available"
    	}
    
    	// Update the status of the CR
    	if err := r.Status().Update(ctx, instance); err != nil {
    		logger.Error(err, "Failed to update MessageQueue status")
    		return ctrl.Result{}, err
    	}
    
    	return ctrl.Result{}, nil
    }
    
    // cleanupExternalResources performs the actual cleanup logic.
    // This function MUST be idempotent.
    func (r *MessageQueueReconciler) cleanupExternalResources(instance *cachev1.MessageQueue) error {
    	log.Log.Info("Deleting external queue", "QueueName", instance.Spec.QueueName)
    	// In a real-world scenario, you would call your cloud provider's API here.
    	// For our example, we call our mock service.
    	err := r.QueueService.DeleteQueue(instance.Spec.QueueName)
    	if err != nil && err != externalservice.ErrQueueNotFound {
    		// If the queue is already gone for some reason, we consider it a success.
    		return err
    	}
    	return nil
    }
    
    func (r *MessageQueueReconciler) SetupWithManager(mgr ctrl.Manager) error {
    	return ctrl.NewControllerManagedBy(mgr).
    		For(&cachev1.MessageQueue{}).
    		Complete(r)
    }

    Dissecting the `Reconcile` Function

  • Check for DeletionTimestamp: The first major branch in our logic is if instance.ObjectMeta.DeletionTimestamp.IsZero(). This is the single most important check. If it's zero, we are in a normal reconciliation. If it's non-zero, the object is being deleted.
  • Adding the Finalizer: In the "normal" path, we immediately check if our finalizer exists. If not, we use controllerutil.AddFinalizer, update the object, and return. This Update call will trigger another reconciliation. This is normal and expected. On the next pass, the finalizer will be present, and the code will proceed to the main logic (step 3).
  • Handling Deletion: In the else block (when DeletionTimestamp is set), we check if our finalizer is still present. This is a safeguard. If it is, we call our cleanup function cleanupExternalResources.
  • Robust Cleanup: cleanupExternalResources is where you delete the S3 bucket, the database instance, or the DNS record. Crucially, if this function returns an error, we do not remove the finalizer. We return the error to controller-runtime, which will requeue the request and try the cleanup again after a backoff period. This guarantees that temporary network issues or API failures don't lead to orphaned resources.
  • Idempotency is Key: The cleanup function must be idempotent. If it's called multiple times, it should have the same outcome. Notice how cleanupExternalResources checks for externalservice.ErrQueueNotFound. If the queue is already gone, it treats this as a success, not an error. This prevents the operator from getting stuck if the external resource was deleted manually.
  • Removing the Finalizer: Only after cleanupExternalResources returns nil do we proceed to controllerutil.RemoveFinalizer and update the CR. This is the signal to Kubernetes that our operator's work is done, and the object can now be safely garbage collected.
  • Advanced Edge Cases and Operational Realities

    Implementing the basic pattern is only half the battle. In production, you will encounter complex edge cases.

    Edge Case 1: The Stuck Finalizer

    Problem: Your operator has a bug, or the external API it calls is permanently down. The cleanup logic consistently fails, and the finalizer is never removed. Your CR is now stuck in a Terminating state indefinitely. kubectl delete commands will hang, and users will be unable to remove the resource.

    Diagnosis:

    bash
    $ kubectl get messagequeue my-stuck-queue -o yaml
    
    apiVersion: cache.my.domain/v1
    kind: MessageQueue
    metadata:
      creationTimestamp: "2023-10-27T10:00:00Z"
      deletionGracePeriodSeconds: 0
      deletionTimestamp: "2023-10-27T10:05:00Z" # This is set!
      finalizers:
      - cache.my.domain/finalizer # This is the culprit!
      name: my-stuck-queue
      namespace: default
      ...

    Solution (Manual Intervention):

    When all else fails, a cluster administrator must intervene. You can manually remove the finalizer by patching the resource. This is a powerful and potentially dangerous operation; you should first manually confirm that the external resource has been cleaned up.

    bash
    # First, verify the external resource is actually gone!
    # (e.g., aws s3 ls | grep 'my-stuck-queue-bucket')
    
    # Then, patch the CR to remove the finalizer
    kubectl patch messagequeue my-stuck-queue --type 'json' -p '[{"op": "remove", "path": "/metadata/finalizers"}]'

    Once the patch is applied, the finalizers list becomes empty. The Kubernetes garbage collector will see the deletionTimestamp is still set and will immediately delete the object.

    Edge Case 2: Controller Concurrency and Race Conditions

    Problem: You set MaxConcurrentReconciles in your manager to a value greater than 1 for performance. It's now possible for two reconciliation loops for the same object to run if events arrive in quick succession. What happens if one loop is trying to add a finalizer while another, slightly older one, is trying to update the status?

    Solution: This is handled gracefully by controller-runtime and the Kubernetes API's use of optimistic concurrency. Every Kubernetes object has a metadata.resourceVersion. When you Update() an object, the client sends the resourceVersion it last read. If the resourceVersion on the server is different (meaning another process updated it), the Update() call will fail with a conflict error. controller-runtime automatically handles this by re-triggering the reconciliation. Your reconciler will re-fetch the latest version of the object and retry its logic.

    While this is handled for you, it's critical to understand why it works and to ensure your logic is stateless and idempotent. Your Reconcile function should always be able to re-run from scratch using the latest version of the CR without causing side effects.

    Edge Case 3: Finalizers on Operator Shutdown

    Problem: Your operator pod is forcefully terminated (SIGKILL) or crashes while it's in the middle of a cleanup operation. The external resource might have been deleted, but the operator didn't get a chance to remove the finalizer.

    Solution: This is the beauty of the reconciliation pattern. When the operator pod restarts, its controllers will re-list all the resources they manage. It will see the MessageQueue object still in its Terminating state with the finalizer present. It will then re-run the cleanupExternalResources function. Because we made the function idempotent, it will see the resource is already gone, return a success, and the new reconciliation loop will successfully remove the finalizer. The system self-heals.

    Performance and API Load Considerations

    Adding a finalizer requires an extra UPDATE API call at the beginning of an object's life. Removing it requires another UPDATE call at the end. For an operator managing tens of thousands of resources with high churn, this can contribute to API server load.

    * Is it worth it? Absolutely. The cost of a few extra API calls is minuscule compared to the cost of orphaned cloud resources or the engineering time spent manually cleaning up a production environment.

    * Batching Operations: For extremely high-throughput scenarios, this pattern remains the same. The performance tuning would happen in the client used to talk to the external service, not in the finalizer logic itself. The finalizer pattern is about correctness, not raw throughput.

    Owner References vs. Finalizers: It's important to distinguish finalizers from OwnerReferences. OwnerReferences are used for garbage collecting in-cluster* resources. For example, a Deployment has an OwnerReference on the ReplicaSets it creates. When you delete the Deployment, Kubernetes automatically deletes the owned ReplicaSets. This works because the garbage collector understands all the objects involved. It does not work for external resources like an S3 bucket, which is why finalizers are the necessary and correct tool for that job.

    Conclusion: The Mark of a Production-Ready Operator

    Mastering the finalizer pattern is a non-negotiable skill for any engineer building production-grade Kubernetes operators. It elevates an operator from a simple proof-of-concept that can create resources to a robust, lifecycle-aware system that manages them safely and reliably.

    The core principles are simple but require disciplined implementation:

  • Use deletionTimestamp to branch your reconciliation logic into "normal" and "deleting" states.
  • Add your finalizer early in the object's lifecycle.
  • Perform cleanup logic only when the object is marked for deletion.
  • Make your cleanup logic idempotent to handle retries and crashes gracefully.
  • Remove the finalizer only after cleanup has verifiably succeeded.
  • By adhering to this pattern, you build controllers that are not only powerful but also trustworthy, preventing resource leaks and ensuring that the declarative state in your cluster accurately reflects the state of the world it manages.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles