Idempotent K8s Operator Reconciliation with Finalizers

12 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Fragility of a Naive Reconciliation Loop

When you first scaffold a Kubernetes operator using a framework like Kubebuilder or the Operator SDK, you're presented with a Reconcile function. The initial temptation is to treat it as a simple script: fetch the Custom Resource (CR), and if the desired child resource (e.g., a Deployment) doesn't exist, create it.

This approach is dangerously flawed and will fail in production. Consider this naive implementation:

go
// DO NOT USE THIS IN PRODUCTION
func (r *MyAppReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
	log := log.FromContext(ctx)
	var myApp mygroup.v1.MyApp
	if err := r.Get(ctx, req.NamespacedName, &myApp); err != nil {
		return ctrl.Result{}, client.IgnoreNotFound(err)
	}

	// Check if the deployment already exists
	found := &appsv1.Deployment{}
	err := r.Get(ctx, types.NamespacedName{Name: myApp.Name, Namespace: myApp.Namespace}, found)
	if err != nil && errors.IsNotFound(err) {
		// Define a new deployment
		depl := r.deploymentForMyApp(&myApp)
		log.Info("Creating a new Deployment", "Deployment.Namespace", depl.Namespace, "Deployment.Name", depl.Name)
		if err := r.Create(ctx, depl); err != nil {
			log.Error(err, "Failed to create new Deployment")
			return ctrl.Result{}, err
		}
		// Deployment created successfully - return and requeue
		return ctrl.Result{Requeue: true}, nil
	}

	return ctrl.Result{}, nil
}

This code has two critical failures:

  • Lack of Idempotency: If the r.Create(ctx, depl) call succeeds but the operator crashes before the function returns, the next reconciliation will find that errors.IsNotFound(err) is false and do nothing. But what if the Deployment's image needs to be updated based on a change in myApp.Spec? This code doesn't handle updates. Worse, if the Create call fails due to a transient API server error, the next reconcile will try to create it again, potentially leading to an AlreadyExists error which, if unhandled, will cause an error-requeue loop.
  • Orphaned Resources: When a user runs kubectl delete myapp my-app-instance, the MyApp CR is deleted. The Kubernetes garbage collector has no knowledge of the Deployment this operator created. It becomes an orphaned resource, consuming cluster resources indefinitely.
  • To build a production-grade operator, we must solve these two problems by implementing an idempotent reconciliation loop and a robust cleanup mechanism using finalizers.

    Achieving Idempotency: The "Converge State" Pattern

    Idempotency, in the context of a Kubernetes controller, means that a reconciliation operation can be executed multiple times with the same input state and will produce the same output state without causing errors or side effects. The goal is not to perform an action, but to ensure the state of the world converges to the desired state.

    The core pattern is Check -> Differentiate -> Act.

  • Check: Fetch the CR and the owned resource(s).
  • Differentiate: Compare the actual state of the owned resource with the desired state derived from the CR's spec.
  • Act: If there's a difference, perform the necessary Create or Update operation to align the actual state with the desired state.
  • Let's refactor our Reconcile function to be idempotent for creates and updates.

    go
    // A more robust, idempotent Reconcile for Create/Update
    func (r *MyAppReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    	log := log.FromContext(ctx)
    	var myApp mygroup.v1.MyApp
    	if err := r.Get(ctx, req.NamespacedName, &myApp); err != nil {
    		return ctrl.Result{}, client.IgnoreNotFound(err)
    	}
    
    	// Define the desired Deployment object
    	desiredDepl := r.deploymentForMyApp(&myApp)
    
    	// Set MyApp instance as the owner and controller
    	if err := controllerutil.SetControllerReference(&myApp, desiredDepl, r.Scheme); err != nil {
    		return ctrl.Result{}, err
    	}
    
    	// Check if this Deployment already exists
    	foundDepl := &appsv1.Deployment{}
    	err := r.Get(ctx, types.NamespacedName{Name: desiredDepl.Name, Namespace: desiredDepl.Namespace}, foundDepl)
    	if err != nil && errors.IsNotFound(err) {
    		log.Info("Creating a new Deployment", "Deployment.Namespace", desiredDepl.Namespace, "Deployment.Name", desiredDepl.Name)
    		if err := r.Create(ctx, desiredDepl); err != nil {
    			log.Error(err, "Failed to create new Deployment")
    			return ctrl.Result{}, err
    		}
    		// Deployment created successfully, no need to requeue immediately
    		return ctrl.Result{}, nil
    	} else if err != nil {
    		return ctrl.Result{}, err
    	}
    
    	// Deployment already exists - check for updates
    	// A simple deep equality check on specs is often sufficient
    	if !reflect.DeepEqual(desiredDepl.Spec, foundDepl.Spec) {
    		log.Info("Updating existing Deployment")
    		// NOTE: In production, you'd want a more sophisticated merge/patch strategy
    		// to avoid clobbering fields set by other controllers (e.g., HPA).
    		// For this example, we'll just overwrite the spec.
    		foundDepl.Spec = desiredDepl.Spec
    		if err := r.Update(ctx, foundDepl); err != nil {
    			log.Error(err, "Failed to update Deployment")
    			return ctrl.Result{}, err
    		}
    	}
    
    	// Finally, update the status of the MyApp resource
    	myApp.Status.AvailableReplicas = foundDepl.Status.AvailableReplicas
    	if !reflect.DeepEqual(myApp.Status, myApp.Status) { // A placeholder for a real status check
    		if err := r.Status().Update(ctx, &myApp); err != nil {
    			log.Error(err, "Failed to update MyApp status")
    			return ctrl.Result{}, err
    		}
    	}
    
    	return ctrl.Result{}, nil
    }
    
    // deploymentForMyApp returns a Deployment object for the given MyApp
    func (r *MyAppReconciler) deploymentForMyApp(m *mygroup.v1.MyApp) *appsv1.Deployment {
    	// ... implementation to build the deployment spec from the MyApp spec
    }

    This is much better. We now handle both creation and updates. The use of controllerutil.SetControllerReference also establishes an owner reference. This means if our operator is uninstalled, Kubernetes's built-in garbage collection will clean up the Deployment. However, this does not solve our primary deletion problem: cleaning up resources when the CR itself is deleted.

    The Deletion Problem and the Finalizer Pattern

    When a user runs kubectl delete myapp , the Kubernetes API server doesn't immediately delete the object. Instead, it does two things:

  • It sets a metadata.deletionTimestamp on the object.
    • It leaves the object in etcd.

    The object is only truly removed from etcd when its metadata.finalizers list is empty.

    This is our hook. A finalizer is a string key that signals to the controller that there is pre-delete cleanup work to be done. Our operator can add a finalizer to the CR. When the CR is deleted, the deletionTimestamp gets set, and our operator receives a reconcile event. Inside the Reconcile loop, we can detect this state, perform our cleanup, and only then remove our finalizer from the list. Once we update the CR with the empty finalizer list, the Kubernetes garbage collector is free to complete the deletion.

    This creates a two-phase deletion process that guarantees our cleanup logic runs to completion.

    The State Machine

    Our Reconcile function is no longer a simple script; it's a state machine handler. For any given CR, it can be in one of several states:

  • Creating: The CR is new, has no deletionTimestamp, and needs its finalizer added and its child resources created.
  • Updating: The CR exists, has a finalizer, and its spec may have changed, requiring updates to child resources.
  • Deleting: The CR has a deletionTimestamp and our finalizer is present. We must execute cleanup logic.
  • Terminated: The CR has a deletionTimestamp but our finalizer is gone. We do nothing, as cleanup is complete.
  • Production-Grade Implementation: Combining Idempotency and Finalizers

    Let's build the complete, production-ready Reconcile function. We'll define a unique finalizer for our controller to avoid conflicting with other controllers that might also be managing this CR.

    go
    package controllers
    
    import (
    	// ... other imports
    	"context"
    	"reflect"
    
    	appsv1 "k8s.io/api/apps/v1"
    	corev1 "k8s.io/api/core/v1"
    	"k8s.io/apimachinery/pkg/api/errors"
    	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
    	"k8s.io/apimachinery/pkg/types"
    	ctrl "sigs.k8s.io/controller-runtime"
    	"sigs.k8s.io/controller-runtime/pkg/client"
    	"sigs.k8s.io/controller-runtime/pkg/controller/controllerutil"
    	"sigs.k8s.io/controller-runtime/pkg/log"
    
    	mygroupv1 "my-operator/api/v1"
    )
    
    const myAppFinalizer = "mygroup.example.com/finalizer"
    
    // MyAppReconciler reconciles a MyApp object
    type MyAppReconciler struct {
    	client.Client
    	Scheme *runtime.Scheme
    }
    
    func (r *MyAppReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    	log := log.FromContext(ctx)
    
    	// 1. Fetch the MyApp instance
    	myApp := &mygroupv1.MyApp{}
    	if err := r.Get(ctx, req.NamespacedName, myApp); err != nil {
    		if errors.IsNotFound(err) {
    			// Request object not found, could have been deleted after reconcile request.
    			// Owned objects are automatically garbage collected. For additional cleanup logic use finalizers.
    			// Return and don't requeue
    			log.Info("MyApp resource not found. Ignoring since object must be deleted")
    			return ctrl.Result{}, nil
    		}
    		// Error reading the object - requeue the request.
    		log.Error(err, "Failed to get MyApp")
    		return ctrl.Result{}, err
    	}
    
    	// 2. Check if the instance is being deleted
    	isMyAppMarkedToBeDeleted := myApp.GetDeletionTimestamp() != nil
    	if isMyAppMarkedToBeDeleted {
    		if controllerutil.ContainsFinalizer(myApp, myAppFinalizer) {
    			// Run finalization logic. If the finalization logic fails,
    			// don't remove the finalizer so that we can retry during the next reconciliation.
    			if err := r.finalizeMyApp(ctx, myApp); err != nil {
    				log.Error(err, "Failed to finalize MyApp")
    				return ctrl.Result{}, err
    			}
    
    			// Remove finalizer. Once all finalizers have been
    			// removed, the object will be deleted.
    			log.Info("Removing finalizer after successful cleanup")
    			controllerutil.RemoveFinalizer(myApp, myAppFinalizer)
    			err := r.Update(ctx, myApp)
    			if err != nil {
    				return ctrl.Result{}, err
    			}
    		}
    		return ctrl.Result{}, nil
    	}
    
    	// 3. Add finalizer for this CR if it doesn't exist
    	if !controllerutil.ContainsFinalizer(myApp, myAppFinalizer) {
    		log.Info("Adding finalizer for the MyApp")
    		controllerutil.AddFinalizer(myApp, myAppFinalizer)
    		if err := r.Update(ctx, myApp); err != nil {
    			return ctrl.Result{}, err
    		}
    	}
    
    	// 4. Run the main reconciliation logic to converge the state
    	// This is where we create/update the Deployment, ConfigMap, etc.
    	
    	// --- Reconcile Deployment ---
    	foundDepl := &appsv1.Deployment{}
    	err := r.Get(ctx, types.NamespacedName{Name: myApp.Name, Namespace: myApp.Namespace}, foundDepl)
    	if err != nil && errors.IsNotFound(err) {
    		// Define and create a new deployment
    		depl := r.deploymentForMyApp(myApp)
    		if err := controllerutil.SetControllerReference(myApp, depl, r.Scheme); err != nil {
    			return ctrl.Result{}, err
    		}
    		log.Info("Creating a new Deployment", "Deployment.Namespace", depl.Namespace, "Deployment.Name", depl.Name)
    		if err := r.Create(ctx, depl); err != nil {
    			return ctrl.Result{}, err
    		}
    		// Requeue to update status after deployment is ready
    		return ctrl.Result{Requeue: true}, nil 
    	} else if err != nil {
    		return ctrl.Result{}, err
    	}
    
    	// Ensure the deployment spec is up to date
    	desiredDepl := r.deploymentForMyApp(myApp)
    	if !reflect.DeepEqual(foundDepl.Spec, desiredDepl.Spec) {
    		log.Info("Deployment spec out of sync, updating...")
    		foundDepl.Spec = desiredDepl.Spec
    		if err := r.Update(ctx, foundDepl); err != nil {
    			return ctrl.Result{}, err
    		}
    	}
    
    	// 5. Update the MyApp status with the current state
    	if myApp.Status.AvailableReplicas != foundDepl.Status.AvailableReplicas {
    		myApp.Status.AvailableReplicas = foundDepl.Status.AvailableReplicas
    		if err := r.Status().Update(ctx, myApp); err != nil {
    			log.Error(err, "Failed to update MyApp status")
    			return ctrl.Result{}, err
    		}
    	}
    
    	return ctrl.Result{}, nil
    }
    
    // finalizeMyApp performs cleanup actions before the CR is deleted.
    // This could include deleting external resources like S3 buckets or database records.
    func (r *MyAppReconciler) finalizeMyApp(ctx context.Context, m *mygroupv1.MyApp) error {
    	log := log.FromContext(ctx)
    	// In a real-world scenario, you would perform cleanup here.
    	// For example, if your operator manages an S3 bucket, you would delete it.
    	// For this example, we'll just log that we are finalizing.
    	log.Info("Successfully finalized MyApp resources")
    	// IMPORTANT: This cleanup logic must also be idempotent.
    	// For instance, deleting an external resource should not fail if it's already gone.
    	return nil
    }
    
    // deploymentForMyApp is a helper to build the desired Deployment state
    func (r *MyAppReconciler) deploymentForMyApp(m *mygroupv1.MyApp) *appsv1.Deployment {
        // ... implementation
    }
    
    // SetupWithManager sets up the controller with the Manager.
    func (r *MyAppReconciler) SetupWithManager(mgr ctrl.Manager) error {
    	return ctrl.NewControllerManagedBy(mgr).
    		For(&mygroupv1.MyApp{}).
    		Owns(&appsv1.Deployment{}). // This tells the controller to watch Deployments and trigger Reconcile for the owner MyApp
    		Complete(r)
    }
    

    Dissecting the Logic

  • Fetch Instance: Standard boilerplate. We ignore NotFound errors as they mean the object was already deleted.
  • Deletion Check: This is the entry point to our deletion state machine. GetDeletionTimestamp() != nil is the canonical way to check if an object is being terminated.
  • If it is, we check for our specific finalizer*. This is crucial for cooperation with other controllers.

    * If our finalizer exists, we call finalizeMyApp(). This function contains the business logic for cleanup.

    * Crucially, if finalizeMyApp() returns an error, we immediately return that error. This triggers a requeue, and we will retry the cleanup later. We do not remove the finalizer.

    * Only upon successful cleanup do we call controllerutil.RemoveFinalizer() and r.Update(). This is the signal to Kubernetes that our work is done.

  • Add Finalizer: If the object is not being deleted, we ensure our finalizer is present. This is an idempotent check. If it's missing, we add it and update the object. This is a critical step to prevent a race condition where a user could delete the CR moments after it's created but before our operator's first reconciliation has a chance to add the finalizer.
  • Main Reconciliation: This is the idempotent Check -> Differentiate -> Act logic we developed earlier. It runs only when the object is in a normal, non-deleting state.
  • Status Update: Keeping the CR's .status subresource up-to-date provides visibility to users and other tools.
  • Advanced Edge Cases and Performance Considerations

    This pattern is robust, but senior engineers must consider the failure modes.

    Edge Case: Partial Failure During Finalization

    Imagine your finalizeMyApp function needs to delete two external resources: a record in a SQL database and a bucket in S3.

    go
    func (r *MyAppReconciler) finalizeMyApp(ctx context.Context, m *mygroupv1.MyApp) error {
    	if err := r.deleteSQLRecord(m.Spec.DatabaseID); err != nil {
    		return err // Requeue
    	}
    
    	// Operator crashes here!
    
    	if err := r.deleteS3Bucket(m.Spec.BucketName); err != nil {
    		return err // Requeue
    	}
    	return nil
    }

    If the operator crashes after deleting the SQL record but before deleting the S3 bucket, what happens?

  • The MyApp CR still exists in Kubernetes with the deletionTimestamp and the finalizer.
  • When the operator restarts, it will receive a reconcile event for this MyApp.
  • The Reconcile loop will enter the deletion logic again and call finalizeMyApp.
  • The call to r.deleteSQLRecord() will be made again.
  • This is why your finalization logic must be idempotent. deleteSQLRecord should not fail if the record with that ID is already gone. It should return nil in that case. Similarly, deleteS3Bucket should gracefully handle a "Bucket Not Found" error.

    Requeueing Strategy

    The controller-runtime logic for requeueing is nuanced:

  • return ctrl.Result{}, nil: Reconciliation was successful. Don't requeue unless an owned resource changes or the CR is modified.
  • return ctrl.Result{Requeue: true}, nil: Reconciliation was successful, but I want to run it again soon. This is often used when waiting for a resource to be fully provisioned, but it can lead to busy-loops. Use with caution.
  • return ctrl.Result{RequeueAfter: duration}, nil: Reconciliation was successful. Requeue after the specified duration. This is perfect for periodic checks that don't merit a full watch.
  • return ctrl.Result{}, err: Reconciliation failed. The controller will requeue the request with an exponential backoff. This is the correct response for transient errors (e.g., network issues, API server unavailability).
  • In our finalizer logic, returning an error on cleanup failure is the correct strategy, as it leverages the built-in exponential backoff to avoid hammering a failing external service.

    Controller-Runtime Client Cache

    By default, the client provided by controller-runtime (r.Client) is a caching client. Reads (Get, List) are served from a local in-memory cache that is kept in sync with the API server via watches. This is extremely efficient.

    However, this can introduce a small delay. When you Update an object, the cache is not updated instantaneously. If you need to read the object back immediately after a write and be 100% certain you have the post-write version (e.g., to get the updated resourceVersion), you may need a non-caching client. This is an advanced use case and is generally not required for the finalizer pattern, as a subsequent reconcile loop will always see the updated version.

    For most operations, relying on the cache is the correct and performant choice. The Update and Create calls always go directly to the API server.

    Conclusion

    The Idempotency + Finalizer pattern is not just a feature; it is the fundamental building block of any robust Kubernetes operator that manages resources. By treating the Reconcile function as a state machine handler rather than a one-shot script, you can build controllers that are resilient to crashes, handle resource lifecycle events gracefully, and behave as first-class citizens in the Kubernetes ecosystem.

    This pattern ensures that when a user deletes your CR, you leave the cluster in a clean state, preventing the resource leaks and orphaned objects that plague naive operator implementations. Mastering this loop is a critical step in moving from a developer who uses Kubernetes to one who truly extends it.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles