Advanced Finalizer Patterns for Kubernetes Operator State Management

14 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Finalizer's True Purpose: Beyond Simple Cleanup

In the world of Kubernetes operators, the reconciliation loop is king. It's the engine that drives the system towards the desired state. While most of our effort focuses on the creation and update paths, the deletion path—governed by finalizers—is where production-grade operators distinguish themselves. A mishandled deletion process can lead to orphaned cloud resources, dangling network policies, or inconsistent state, resulting in security vulnerabilities and unnecessary costs.

A finalizer is a simple concept: a string in a resource's metadata that tells the Kubernetes API server to prevent garbage collection until that string is removed. This mechanism transforms a resource's deletion from a synchronous DELETE API call into an asynchronous process. When a user runs kubectl delete my-crd, the API server simply sets the metadata.deletionTimestamp field. It's now the controller's responsibility to perform cleanup and then, and only then, remove its finalizer, allowing the API server to complete the deletion.

This article assumes you've already implemented a basic finalizer. We won't cover the introductory if !controllerutil.ContainsFinalizer(...) boilerplate. Instead, we will dive into the complex scenarios that arise when your operator manages more than just Kubernetes-native resources. We'll explore advanced, stateful patterns for orchestrating the teardown of external dependencies, handling multi-stage cleanup operations, and managing complex object graphs during deletion.


Pattern 1: Idempotent Finalization for a Single External Resource

Let's start with the foundational pattern: managing a single external resource, such as a database in a cloud provider. The core challenge is ensuring the cleanup logic is idempotent. The reconciliation loop can be triggered multiple times while the deletionTimestamp is set, especially if the controller restarts or a previous attempt fails. Your cleanup logic must be safe to re-execute.

Scenario: Our operator manages a ManagedDatabase custom resource, which provisions a PostgreSQL database on a fictional cloud provider CloudCorp.

First, our CRD definition:

go
// api/v1alpha1/manageddatabase_types.go
package v1alpha1

import (
	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)

// ManagedDatabaseSpec defines the desired state of ManagedDatabase
type ManagedDatabaseSpec struct {
	DBName string `json:"dbName"`
	Region string `json:"region"`
}

// ManagedDatabaseStatus defines the observed state of ManagedDatabase
type ManagedDatabaseStatus struct {
	// The ID of the database in the external cloud provider
	ProviderID string `json:"providerId,omitempty"`
	// Current state of the database
	State string `json:"state,omitempty"`
}

//+kubebuilder:object:root=true
//+kubebuilder:subresource:status

type ManagedDatabase struct {
	metav1.TypeMeta   `json:",inline"`
	metav1.ObjectMeta `json:",inline"`

	Spec   ManagedDatabaseSpec   `json:"spec,omitempty"`
	Status ManagedDatabaseStatus `json:"status,omitempty"`
}

The key is the ProviderID in the status. This is the link between our Kubernetes object and the real-world resource. We must have this to perform a delete.

Our controller's Reconcile method will contain the finalizer logic. Let's define our finalizer name.

go
// internal/controller/manageddatabase_controller.go
const managedDatabaseFinalizer = "database.example.com/finalizer"

Now, the core reconciliation logic for deletion:

go
// internal/controller/manageddatabase_controller.go
import (
	"context"
	"fmt"

	// ... other imports
	ctrl "sigs.k8s.io/controller-runtime"
	"sigs.k8s.io/controller-runtime/pkg/client"
	"sigs.k8s.io/controller-runtime/pkg/controller/controllerutil"
	"sigs.k8s.io/controller-runtime/pkg/log"

	// ... local API import
)

func (r *ManagedDatabaseReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
	logger := log.FromContext(ctx)

	// 1. Fetch the ManagedDatabase instance
	db := &databasev1alpha1.ManagedDatabase{}
	if err := r.Get(ctx, req.NamespacedName, db); err != nil {
		return ctrl.Result{}, client.IgnoreNotFound(err)
	}

	// 2. Examine DeletionTimestamp to determine if the object is being deleted.
	if db.ObjectMeta.DeletionTimestamp.IsZero() {
		// The object is not being deleted, so we add our finalizer if it does not exist.
		if !controllerutil.ContainsFinalizer(db, managedDatabaseFinalizer) {
			logger.Info("Adding finalizer for ManagedDatabase")
			controllerutil.AddFinalizer(db, managedDatabaseFinalizer)
			if err := r.Update(ctx, db); err != nil {
				return ctrl.Result{}, err
			}
		}
		// ... Normal reconciliation logic for create/update ...

	} else {
		// The object is being deleted.
		if controllerutil.ContainsFinalizer(db, managedDatabaseFinalizer) {
			logger.Info("Performing finalizer logic for ManagedDatabase")

			// Our custom finalizer logic
			if err := r.finalizeManagedDatabase(ctx, db); err != nil {
				// If the cleanup fails, we don't remove the finalizer.
				// The reconciliation will be retried.
				logger.Error(err, "Failed to finalize ManagedDatabase")
				return ctrl.Result{}, err
			}

			// Cleanup was successful. Remove the finalizer.
			logger.Info("ManagedDatabase finalized successfully. Removing finalizer.")
			controllerutil.RemoveFinalizer(db, managedDatabaseFinalizer)
			if err := r.Update(ctx, db); err != nil {
				return ctrl.Result{}, err
			}
		}

		// Stop reconciliation as the item is being deleted
		return ctrl.Result{}, nil
	}

	return ctrl.Result{}, nil
}

// finalizeManagedDatabase performs the actual cleanup.
func (r *ManagedDatabaseReconciler) finalizeManagedDatabase(ctx context.Context, db *databasev1alpha1.ManagedDatabase) error {
	logger := log.FromContext(ctx)

	// Check if the external resource ID exists. If not, it may have been deleted already
	// or was never created. In either case, we can consider cleanup successful.
	if db.Status.ProviderID == "" {
		logger.Info("External database ProviderID is missing. Assuming it was never created or already cleaned up.")
		return nil
	}

	logger.Info("Deleting external database", "ProviderID", db.Status.ProviderID)

	// Fictional cloud client
	cloudClient := cloudcorp.NewClient(r.CloudCorpCredentials)
	exists, err := cloudClient.DatabaseExists(ctx, db.Status.ProviderID)
	if err != nil {
		return fmt.Errorf("failed to check existence of external database %s: %w", db.Status.ProviderID, err)
	}

	// Idempotency Check: If the resource is already gone, we're done.
	if !exists {
		logger.Info("External database not found. Cleanup is complete.")
		return nil
	}

	// Issue the delete call.
	if err := cloudClient.DeleteDatabase(ctx, db.Status.ProviderID); err != nil {
		// This could be a transient error. We return the error to trigger a retry.
		return fmt.Errorf("failed to delete external database %s: %w", db.Status.ProviderID, err)
	}

	logger.Info("Successfully initiated deletion of external database", "ProviderID", db.Status.ProviderID)
	return nil
}

Key Production Considerations:

  • Idempotency: The finalizeManagedDatabase function first checks if a ProviderID exists. If not, it assumes success. Then, it checks if the external resource actually exists via the cloud API. If it's already gone, it returns nil, preventing repeated DELETE calls that might error on a non-existent resource. This is crucial for recovery after a partial failure.
  • State Separation: The desired state is in Spec, the observed state is in Status. The ProviderID is the critical piece of observed state that links the abstract Kubernetes resource to the concrete external one. Without it, cleanup is impossible.
  • Error Handling: Returning an error from finalizeManagedDatabase prevents the finalizer's removal and triggers requeueing with exponential backoff (the default controller-runtime behavior). This is correct for transient network or API errors.

  • Pattern 2: Stateful Finalizers for Multi-Stage Cleanup

    What if deleting an external resource isn't a single API call? Consider deleting a production database: you might need to quiesce it, take a final snapshot, wait for the snapshot to complete, and then issue the delete command. This is a state machine, and our finalizer logic must reflect that.

    Scenario: Our ManagedDatabase now requires a final snapshot before deletion. This process involves two asynchronous calls: CreateSnapshot and DeleteDatabase.

    We'll enhance our ManagedDatabaseStatus to track the cleanup progress.

    go
    // api/v1alpha1/manageddatabase_types.go
    
    type DeletionPhase string
    
    const (
    	DeletionPhaseNone               DeletionPhase = ""
    	DeletionPhaseSnapshotting       DeletionPhase = "Snapshotting"
    	DeletionPhaseSnapshotCompleted  DeletionPhase = "SnapshotCompleted"
    	DeletionPhaseDeleting           DeletionPhase = "Deleting"
    )
    
    type ManagedDatabaseStatus struct {
    	ProviderID     string        `json:"providerId,omitempty"`
    	State          string        `json:"state,omitempty"`
    	// New fields for stateful deletion
    	DeletionPhase  DeletionPhase `json:"deletionPhase,omitempty"`
    	SnapshotID     string        `json:"snapshotId,omitempty"`
    }

    Our finalizeManagedDatabase function now becomes a state machine dispatcher.

    go
    // internal/controller/manageddatabase_controller.go
    
    func (r *ManagedDatabaseReconciler) finalizeManagedDatabase(ctx context.Context, db *databasev1alpha1.ManagedDatabase) error {
    	logger := log.FromContext(ctx)
    
    	if db.Status.ProviderID == "" {
    		logger.Info("External database ProviderID is missing, skipping finalization.")
    		return nil
    	}
    
    	switch db.Status.DeletionPhase {
    	case databasev1alpha1.DeletionPhaseNone:
    		return r.handleDeletionPhaseNone(ctx, db)
    	case databasev1alpha1.DeletionPhaseSnapshotting:
    		return r.handleDeletionPhaseSnapshotting(ctx, db)
    	case databasev1alpha1.DeletionPhaseSnapshotCompleted:
    		return r.handleDeletionPhaseSnapshotCompleted(ctx, db)
    	case databasev1alpha1.DeletionPhaseDeleting:
    		return r.handleDeletionPhaseDeleting(ctx, db)
    	default:
    		// If phase is empty or unknown, start from the beginning.
    		return r.handleDeletionPhaseNone(ctx, db)
    	}
    }
    
    func (r *ManagedDatabaseReconciler) handleDeletionPhaseNone(ctx context.Context, db *databasev1alpha1.ManagedDatabase) error {
    	logger := log.FromContext(ctx)
    	logger.Info("Starting finalization: snapshotting phase")
    
    	cloudClient := cloudcorp.NewClient(r.CloudCorpCredentials)
    	snapshotID, err := cloudClient.CreateSnapshot(ctx, db.Status.ProviderID)
    	if err != nil {
    		return fmt.Errorf("failed to create snapshot: %w", err)
    	}
    
    	// Update status to reflect the new phase and store the snapshot ID.
    	db.Status.DeletionPhase = databasev1alpha1.DeletionPhaseSnapshotting
    	db.Status.SnapshotID = snapshotID
    	return r.Status().Update(ctx, db)
    }
    
    func (r *ManagedDatabaseReconciler) handleDeletionPhaseSnapshotting(ctx context.Context, db *databasev1alpha1.ManagedDatabase) error {
    	logger := log.FromContext(ctx)
    	logger.Info("Checking snapshot status", "SnapshotID", db.Status.SnapshotID)
    
    	cloudClient := cloudcorp.NewClient(r.CloudCorpCredentials)
    	isComplete, err := cloudClient.IsSnapshotComplete(ctx, db.Status.SnapshotID)
    	if err != nil {
    		return fmt.Errorf("failed to check snapshot status: %w", err)
    	}
    
    	if !isComplete {
    		logger.Info("Snapshot is not yet complete, requeueing")
    		// Requeue to check again later. We don't return an error here.
    		// We need to return a result object to control the requeue time.
    		// In the main Reconcile function, you would need to handle this. For simplicity here, we assume the main loop requeues.
            // A better implementation in the main reconcile loop would be: 
            // if err := r.finalize...; err != nil { if requeueErr, ok := err.(*RequeueError); ok { return ctrl.Result{RequeueAfter: requeueErr.After}, nil } ... }
    		return nil // Or a custom error type indicating a requeue is needed.
    	}
    
    	logger.Info("Snapshot complete. Moving to deletion phase.")
    	db.Status.DeletionPhase = databasev1alpha1.DeletionPhaseSnapshotCompleted
    	return r.Status().Update(ctx, db)
    }
    
    func (r *ManagedDatabaseReconciler) handleDeletionPhaseSnapshotCompleted(ctx context.Context, db *databasev1alpha1.ManagedDatabase) error {
    	logger := log.FromContext(ctx)
    	logger.Info("Deleting external database", "ProviderID", db.Status.ProviderID)
    
    	cloudClient := cloudcorp.NewClient(r.CloudCorpCredentials)
    	if err := cloudClient.DeleteDatabase(ctx, db.Status.ProviderID); err != nil {
    		return fmt.Errorf("failed to delete external database: %w", err)
    	}
    
    	db.Status.DeletionPhase = databasev1alpha1.DeletionPhaseDeleting
    	return r.Status().Update(ctx, db)
    }
    
    func (r *ManagedDatabaseReconciler) handleDeletionPhaseDeleting(ctx context.Context, db *databasev1alpha1.ManagedDatabase) error {
    	logger := log.FromContext(ctx)
    	logger.Info("Checking if external database is deleted", "ProviderID", db.Status.ProviderID)
    
    	cloudClient := cloudcorp.NewClient(r.CloudCorpCredentials)
    	exists, err := cloudClient.DatabaseExists(ctx, db.Status.ProviderID)
    	if err != nil {
    		return fmt.Errorf("failed to check existence of external database: %w", err)
    	}
    
    	if exists {
    		logger.Info("External database still exists, requeueing")
    		return nil // Requeue
    	}
    
    	logger.Info("External database successfully deleted.")
    	// This is the final step. We don't update status here. The main loop will remove the finalizer.
    	// Returning nil signals that the entire finalization process is complete.
    	return nil
    }

    Key Production Considerations:

  • Transactional Updates: Each state transition is persisted to the Status subresource. If the controller crashes between phases, it can resume exactly where it left off on the next reconciliation.
  • Polling vs. Events: The example uses polling (IsSnapshotComplete, DatabaseExists). In a real-world scenario, if the cloud provider supports webhooks or an eventing system (e.g., AWS EventBridge), you could build a more efficient, event-driven operator that reacts to external state changes instead of polling. This reduces latency and API calls.
  • Requeue Logic: In handleDeletionPhaseSnapshotting, we return nil to avoid exponential backoff for a non-error condition (waiting). The main Reconcile function should inspect the error type or use the ctrl.Result{RequeueAfter: ...} to implement a controlled polling interval (e.g., requeue every 30 seconds).

  • Pattern 3: Orchestrating Cleanup with Owner References and Finalizers

    Operators often manage a graph of objects. A top-level CR might create other CRs, Deployments, Services, and Secrets. Kubernetes's garbage collection, via OwnerReferences, is powerful but can be insufficient. If a child resource needs its own complex cleanup (i.e., it has its own finalizer), the parent must wait for the child's finalization to complete before proceeding.

    Scenario: We introduce a DatabaseUser CR. Our ManagedDatabase controller now also creates a DatabaseUser resource for the application. The DatabaseUser has its own finalizer to remove the user from the database before its own deletion. The ManagedDatabase must not be deleted until all its associated DatabaseUser objects are gone.

    First, we ensure the ManagedDatabase controller sets an OwnerReference on the DatabaseUser it creates.

    go
    // During the normal (non-deletion) reconcile loop for ManagedDatabase
    user := &databasev1alpha1.DatabaseUser{
        // ... spec ...
    }
    // Set the ManagedDatabase as the owner and controller
    if err := controllerutil.SetControllerReference(db, user, r.Scheme); err != nil {
        return ctrl.Result{}, err
    }
    r.Create(ctx, user)

    Now, the ManagedDatabase finalizer logic must be augmented to check for dependent DatabaseUser objects.

    go
    // internal/controller/manageddatabase_controller.go
    
    const databaseUserFinalizer = "database.example.com/user-finalizer"
    
    // In the main Reconcile function, inside the finalizer block:
    if controllerutil.ContainsFinalizer(db, managedDatabaseFinalizer) {
        // 1. Check if dependent resources are cleaned up first.
        if r.hasDependentUsers(ctx, db) {
            logger.Info("Waiting for dependent DatabaseUser objects to be finalized.")
            // Requeue to wait for children to be deleted.
            return ctrl.Result{RequeueAfter: 10 * time.Second}, nil
        }
    
        // 2. Proceed with our own finalizer logic (e.g., the state machine from Pattern 2)
        if err := r.finalizeManagedDatabase(ctx, db); err != nil {
            return ctrl.Result{}, err
        }
    
        // ... remove finalizer ...
    }
    
    // hasDependentUsers checks if any DatabaseUser objects owned by the ManagedDatabase still exist.
    func (r *ManagedDatabaseReconciler) hasDependentUsers(ctx context.Context, db *databasev1alpha1.ManagedDatabase) bool {
    	userList := &databasev1alpha1.DatabaseUserList{}
    	// List all users in the same namespace.
    	// Use field selector to find users owned by this db instance.
    	if err := r.List(ctx, userList, client.InNamespace(db.Namespace), client.MatchingFields{".metadata.controller": db.Name}); err != nil {
    		// Log the error but assume dependents exist to be safe.
    		log.FromContext(ctx).Error(err, "Failed to list dependent DatabaseUsers, assuming they still exist")
    		return true
    	}
    
    	return len(userList.Items) > 0
    }
    
    // We need to set up the field indexer in our main.go for this to work.
    // mgr.GetFieldIndexer().IndexField(context.Background(), &databasev1alpha1.DatabaseUser{}, ".metadata.controller", func(rawObj client.Object) []string {
    // 	user := rawObj.(*databasev1alpha1.DatabaseUser{})
    // 	owner := metav1.GetControllerOf(user)
    // 	if owner == nil || owner.APIVersion != apiGVStr || owner.Kind != "ManagedDatabase" {
    // 		return nil
    // 	}
    // 	return []string{owner.Name}
    // })

    How this orchestration works:

  • User runs kubectl delete manageddatabase my-db.
  • The deletionTimestamp is set on my-db. The Kubernetes garbage collector sees this and sends DELETE requests to all objects with an OwnerReference pointing to my-db, including our DatabaseUser objects.
  • The DatabaseUser objects get their own deletionTimestamp set. Their controller's finalizer logic kicks in to remove the user from the database.
  • Meanwhile, the ManagedDatabase controller's reconciliation loop is running for my-db. Its hasDependentUsers check finds that the DatabaseUser objects still exist (because their finalizers are blocking their deletion). It requeues and waits.
  • Once the DatabaseUser controller successfully removes the user from the database, it removes its finalizer from the DatabaseUser object. The API server then garbage collects the object.
  • On a subsequent reconciliation, the ManagedDatabase controller's hasDependentUsers check finds no remaining DatabaseUser objects. It then proceeds with its own multi-stage cleanup (snapshotting, etc.).
  • This pattern creates a robust, ordered teardown process, ensuring that you don't delete a database while active users are still defined for it.


    Edge Cases and Performance Considerations

    Stuck Finalizers:

    The biggest operational risk with finalizers is them getting stuck. This happens if the finalizer logic repeatedly fails or enters a state where it can no longer make progress. The resource becomes undeletable via kubectl.

    * Cause: A bug in the controller, permanent failure of an external API, or loss of credentials.

    * Mitigation:

    * Metrics & Alerts: Your operator must expose Prometheus metrics on finalization duration and failure counts. Set up alerts for finalizers that have been pending for an unreasonable amount of time (e.g., > 1 hour).

    * Timeouts: Implement a timeout within your finalization logic. If cleanup doesn't complete within a certain period, update the resource's status with a Failed condition and stop retrying, requiring manual intervention.

    * Manual Intervention: The only way to fix a truly stuck finalizer is to manually patch the resource to remove it: kubectl patch manageddatabase my-db --type json -p='[{"op": "remove", "path": "/metadata/finalizers"}]'. This is a break-glass procedure and can lead to orphaned external resources if the cleanup was not actually performed.

    Controller Starvation and Concurrency:

    By default, a controller reconciles one resource at a time per worker. If your finalizer logic involves a long-running, blocking call (e.g., waiting 10 minutes for a snapshot), it ties up a worker, preventing it from reconciling other resources.

    * Problem: A single slow deletion can halt all other operations for that controller.

    * Solution:

    * Asynchronous Offloading: For long-running tasks, the controller should create a Kubernetes Job to perform the work and then use polling (or the Job's completion status) to track progress. The finalizer logic becomes: 1. Create Job. 2. Update status to DeletionJobCreated. 3. Requeue and wait. 4. On next reconcile, check Job status. This frees the controller worker immediately.

    * Increase Worker Count: You can configure the controller manager's MaxConcurrentReconciles option to allow more reconciliations to run in parallel. This is a blunt instrument and can increase pressure on the Kubernetes API server and external systems.

    API Server Pressure:

    Each r.Status().Update(ctx, db) or r.Update(ctx, db) is an API call. In a multi-stage finalizer, frequent status updates can add significant load.

    * Problem: A chatty finalizer can contribute to API server throttling, especially in a large cluster.

    * Solution:

    * Batch Status Updates: If a few steps in your state machine can be executed quickly and synchronously, perform them all and then issue a single Status().Update() call with the final state of that batch.

    * Smart Requeueing: Use ctrl.Result{RequeueAfter: ...} with sensible delays. Don't poll an external API every second. Match your polling interval to the expected completion time of the external operation.

    Conclusion: Finalizers as a Mark of Maturity

    Implementing finalizers correctly is a rite of passage for any Kubernetes operator developer. Moving beyond the basic pattern to embrace stateful, multi-stage, and dependency-aware finalization logic is what separates a proof-of-concept from a resilient, production-ready system. By treating the deletion path with the same rigor as the creation and update paths, you build controllers that are not only powerful in what they create but also safe and reliable in what they destroy.

    The patterns discussed here—idempotent external calls, status-driven state machines, and orchestrated cleanup using owner references—provide a robust framework for managing the complete lifecycle of your custom resources and their real-world counterparts. They ensure that even in the face of failures, restarts, and complex dependencies, your operator remains a predictable and trustworthy steward of your infrastructure.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles