Idempotent Kubernetes Operator Reconcile Loops with Finalizers

16 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Fragility of a Naive Reconcile Loop

When scaffolding a new Kubernetes Operator with tools like Kubebuilder or Operator SDK, the initial Reconcile function is deceptively simple. It often presents a straightforward path: check if a resource exists, and if not, create it. While this serves as a functional starting point, it embodies a dangerous anti-pattern for any real-world, stateful application. A production-grade operator must be more than a simple resource creator; it must be a persistent, resilient state machine.

A naive reconcile loop typically looks something like this:

go
// WARNING: INCOMPLETE AND PROBLEMATIC EXAMPLE
func (r *MyResourceReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    log := log.FromContext(ctx)
    var myResource mygroup.v1.MyResource

    if err := r.Get(ctx, req.NamespacedName, &myResource); err != nil {
        return ctrl.Result{}, client.IgnoreNotFound(err)
    }

    // Check if the deployment exists, if not create it
    found := &appsv1.Deployment{}
    err := r.Get(ctx, types.NamespacedName{Name: myResource.Name, Namespace: myResource.Namespace}, found)
    if err != nil && errors.IsNotFound(err) {
        // Define a new deployment
        dep := r.deploymentForMyResource(&myResource)
        log.Info("Creating a new Deployment", "Deployment.Namespace", dep.Namespace, "Deployment.Name", dep.Name)
        err = r.Create(ctx, dep)
        if err != nil {
            return ctrl.Result{}, err
        }
        // Deployment created successfully - return and requeue
        return ctrl.Result{Requeue: true}, nil
    }

    return ctrl.Result{}, nil
}

This implementation is riddled with critical flaws that manifest under real-world conditions:

  • Lack of Idempotency: If the Create call succeeds but the Reconcile function fails before returning, the next reconciliation will attempt to create the Deployment again, resulting in an AlreadyExists error. The loop is not idempotent; its outcome depends on how many times it runs.
  • No Update Handling: If a user modifies the MyResource CR spec (e.g., changes the image tag), this loop does nothing. It does not observe the drift between the desired state (in the CR) and the actual state (in the Deployment) and take corrective action.
  • Orphaned Resources: When a user executes kubectl delete myresource my-sample, the controller does nothing. The MyResource object is deleted, but the Deployment it created is left running—a classic example of an orphaned resource.
  • State Blindness: The controller has no mechanism to report its status. Is it currently reconciling? Did it encounter an error? Is the application healthy? This lack of feedback makes debugging and monitoring nearly impossible.
  • To build a controller that is trusted with production workloads, we must move beyond this naive approach and embrace patterns that ensure correctness, resilience, and lifecycle awareness. This involves architecting an idempotent control loop and leveraging finalizers for managing resource deletion.

    Achieving True Idempotency: The Observe->Diff->Act Pattern

    Idempotency within a Kubernetes controller means that for a given Custom Resource state, the reconcile loop can be executed one or one hundred times and the resulting system state will be the same. The key to achieving this is the Observe -> Diff -> Act pattern.

  • Observe: Fetch the primary resource (the CR) and all secondary, managed resources (Deployments, Services, ConfigMaps, etc.). Construct a comprehensive view of the actual state of the system.
  • Diff: Compare the observed actual state with the desired state defined in the CR's spec. Identify any discrepancies.
  • Act: Execute the precise actions (Create, Update, or Delete) required to converge the actual state towards the desired state. If there is no difference, do nothing.
  • Let's refactor our Reconcile function to implement this pattern. We'll manage a Deployment and a Service for our MyResource.

    Code Example 1: An Idempotent Reconcile Loop

    First, let's assume our MyResourceSpec looks like this:

    go
    // MyResourceSpec defines the desired state of MyResource
    type MyResourceSpec struct {
    	ReplicaCount *int32 `json:"replicaCount,omitempty"`
    	Image        string `json:"image,omitempty"`
    	Port         *int32 `json:"port,omitempty"`
    }

    Now, the improved Reconcile function:

    go
    package controllers
    
    import (
    	"context"
    	"reflect"
    
    	appsv1 "k8s.io/api/apps/v1"
    	corev1 "k8s.io/api/core/v1"
    	"k8s.io/apimachinery/pkg/api/errors"
    	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
    	"k8s.io/apimachinery/pkg/types"
    	"k8s.io/apimachinery/pkg/util/intstr"
    	ctrl "sigs.k8s.io/controller-runtime"
    	"sigs.k8s.io/controller-runtime/pkg/client"
    	"sigs.k8s.io/controller-runtime/pkg/log"
    
    	mygroupv1 "my.domain/api/v1"
    )
    
    // ... (Reconciler struct definition)
    
    func (r *MyResourceReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    	log := log.FromContext(ctx)
    
    	// 1. OBSERVE: Fetch the MyResource instance
    	var myResource mygroupv1.MyResource
    	if err := r.Get(ctx, req.NamespacedName, &myResource); err != nil {
    		log.Error(err, "unable to fetch MyResource")
    		return ctrl.Result{}, client.IgnoreNotFound(err)
    	}
    
    	// 2. OBSERVE: Fetch the managed Deployment
    	foundDeployment := &appsv1.Deployment{}
    	err := r.Get(ctx, types.NamespacedName{Name: myResource.Name, Namespace: myResource.Namespace}, foundDeployment)
    	if err != nil && errors.IsNotFound(err) {
    		// A. ACT: Deployment does not exist, so create it.
    		log.Info("Creating new Deployment")
    		dep := r.deploymentForMyResource(&myResource)
    		if err := ctrl.SetControllerReference(&myResource, dep, r.Scheme); err != nil {
    			return ctrl.Result{}, err
    		}
    		if err := r.Create(ctx, dep); err != nil {
    			log.Error(err, "Failed to create new Deployment")
    			return ctrl.Result{}, err
    		}
    		// Requeue to check status after creation
    		return ctrl.Result{Requeue: true}, nil
    	} else if err != nil {
    		log.Error(err, "Failed to get Deployment")
    		return ctrl.Result{}, err
    	}
    
    	// 3. DIFF: Compare the desired state with the actual state of the Deployment
    	desiredDeployment := r.deploymentForMyResource(&myResource)
    
    	// A simple deep equal might not be enough. Kubernetes injects default values.
    	// We need to compare the fields we care about.
    	if *foundDeployment.Spec.Replicas != *desiredDeployment.Spec.Replicas ||
    		foundDeployment.Spec.Template.Spec.Containers[0].Image != desiredDeployment.Spec.Template.Spec.Containers[0].Image {
    		
    		log.Info("Deployment spec mismatch. Updating...")
    		foundDeployment.Spec.Replicas = desiredDeployment.Spec.Replicas
    		foundDeployment.Spec.Template.Spec.Containers = desiredDeployment.Spec.Template.Spec.Containers
    
    		// B. ACT: Update the found Deployment
    		if err := r.Update(ctx, foundDeployment); err != nil {
    			log.Error(err, "Failed to update Deployment")
    			return ctrl.Result{}, err
    		}
    	}
    
    	// Do the same Observe->Diff->Act for the Service
    	foundService := &corev1.Service{}
    	err = r.Get(ctx, types.NamespacedName{Name: myResource.Name, Namespace: myResource.Namespace}, foundService)
    	if err != nil && errors.IsNotFound(err) {
    		log.Info("Creating new Service")
    		svc := r.serviceForMyResource(&myResource)
    		if err := ctrl.SetControllerReference(&myResource, svc, r.Scheme); err != nil {
    			return ctrl.Result{}, err
    		}
    		if err := r.Create(ctx, svc); err != nil {
    			log.Error(err, "Failed to create new Service")
    			return ctrl.Result{}, err
    		}
    		return ctrl.Result{Requeue: true}, nil
    	} else if err != nil {
    		log.Error(err, "Failed to get Service")
    		return ctrl.Result{}, err
    	}
    
    	// For services, some fields are immutable or set by the cluster (e.g., ClusterIP).
    	// We only compare the fields we can and should control.
    	desiredService := r.serviceForMyResource(&myResource)
    	if !reflect.DeepEqual(foundService.Spec.Ports, desiredService.Spec.Ports) || !reflect.DeepEqual(foundService.Spec.Selector, desiredService.Spec.Selector) {
    		log.Info("Service spec mismatch. Updating...")
    		// Preserve the ClusterIP
    		clusterIP := foundService.Spec.ClusterIP
    		foundService.Spec = desiredService.Spec
    		foundService.Spec.ClusterIP = clusterIP
    		if err := r.Update(ctx, foundService); err != nil {
    			log.Error(err, "Failed to update Service")
    			return ctrl.Result{}, err
    		}
    	}
    
    	log.Info("All resources are in the desired state.")
    	return ctrl.Result{}, nil
    }
    
    // Helper functions to define desired state
    func (r *MyResourceReconciler) deploymentForMyResource(m *mygroupv1.MyResource) *appsv1.Deployment {
    	// ... implementation to build and return a Deployment struct
    }
    
    func (r *MyResourceReconciler) serviceForMyResource(m *mygroupv1.MyResource) *corev1.Service {
    	// ... implementation to build and return a Service struct
    }

    A critical detail here is ctrl.SetControllerReference. This sets the ownerReferences field on the managed resource. This is essential for two reasons:

  • It links the child resource (Deployment) to the parent (MyResource), which is visible in kubectl and other tools.
  • It enables Kubernetes garbage collection. If our operator is uninstalled or fails catastrophically, deleting the MyResource CR will cause Kubernetes itself to garbage collect the owned Deployment and Service. This provides a baseline safety net, but it's insufficient for complex cleanup logic.
  • This idempotent loop is a massive improvement, but it still fails to handle graceful deletion. What if deleting our application requires draining connections, backing up a database, or notifying an external system? For that, we need finalizers.

    The Critical Role of Finalizers for Graceful Deletion

    A finalizer is a key in the metadata.finalizers list of a Kubernetes object. When a finalizer is present, a kubectl delete command does not immediately remove the object. Instead, the API server sets a deletionTimestamp on the object and puts it into a Terminating state. The object will not be garbage collected until all finalizers are removed from its list.

    This mechanism provides a hook for our operator to perform pre-delete cleanup. The Reconcile function is triggered for the terminating object, and it's our controller's responsibility to perform its cleanup logic and then remove its own finalizer.

    Here is the complete lifecycle:

  • Creation: Our operator receives a new CR. In the first reconcile, it adds its own finalizer to the CR's finalizers list.
  • Deletion Request: A user runs kubectl delete myresource my-sample.
  • API Server Action: The API server sees the delete request, notes the presence of our finalizer, and sets the metadata.deletionTimestamp.
  • Reconciliation: The controller-runtime triggers a reconcile for the CR. Our code detects that deletionTimestamp is not nil.
  • Cleanup Logic: Our controller executes its specific cleanup tasks (e.g., scaling down replicas, calling an external API, deleting persistent volumes).
  • Finalizer Removal: Once cleanup is successful, the controller patches the CR to remove its finalizer from the list.
  • Garbage Collection: The API server now sees an object with a deletionTimestamp and an empty finalizers list, and it proceeds to delete the object from etcd.
  • Code Example 2: Implementing the Finalizer Logic

    Let's integrate this into our Reconcile function.

    go
    // A unique name for our finalizer
    const myFinalizerName = "my.domain/finalizer"
    
    func (r *MyResourceReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    	log := log.FromContext(ctx)
    
    	var myResource mygroupv1.MyResource
    	if err := r.Get(ctx, req.NamespacedName, &myResource); err != nil {
    		return ctrl.Result{}, client.IgnoreNotFound(err)
    	}
    
    	// Examine if the object is under deletion
    	if myResource.ObjectMeta.DeletionTimestamp.IsZero() {
    		// The object is not being deleted, so we add our finalizer if it doesn't exist.
    		if !containsString(myResource.GetFinalizers(), myFinalizerName) {
    			myResource.SetFinalizers(append(myResource.GetFinalizers(), myFinalizerName))
    			if err := r.Update(ctx, &myResource); err != nil {
    				return ctrl.Result{}, err
    			}
    		}
    	} else {
    		// The object is being deleted
    		if containsString(myResource.GetFinalizers(), myFinalizerName) {
    			// Our finalizer is present, so let's handle external dependency cleanup
    			if err := r.cleanupExternalResources(&myResource); err != nil {
    				// if fail to delete the external dependency here, return with error
    				// so that it can be retried
    				return ctrl.Result{}, err
    			}
    
    			// Once cleanup is successful, remove the finalizer
    			myResource.SetFinalizers(removeString(myResource.GetFinalizers(), myFinalizerName))
    			if err := r.Update(ctx, &myResource); err != nil {
    				return ctrl.Result{}, err
    			}
    		}
    
    		// Stop reconciliation as the item is being deleted
    		return ctrl.Result{}, nil
    	}
    
    	// ... (The idempotent Observe->Diff->Act logic from Example 1 goes here)
    	// ...
    
    	return ctrl.Result{}, nil
    }
    
    func (r *MyResourceReconciler) cleanupExternalResources(m *mygroupv1.MyResource) error {
    	// This is where you would put your complex cleanup logic.
    	// For example, calling an external API to deregister a service, deleting a PVC, etc.
    	// IMPORTANT: This logic MUST be idempotent.
    	log := log.FromContext(context.Background())
    	log.Info("performing cleanup for MyResource", "name", m.Name)
    	// For this example, we'll assume cleanup is always successful.
    	// In a real-world scenario, you would handle errors and potentially retry.
    	return nil
    }
    
    // Helper functions for finalizer string slice manipulation
    func containsString(slice []string, s string) bool {
    	for _, item := range slice {
    		if item == s {
    			return true
    		}
    	}
    	return false
    }
    
    func removeString(slice []string, s string) (result []string) {
    	for _, item := range slice {
    		if item == s {
    			continue
    		}
    		result = append(result, item)
    	}
    	return
    }

    This structure robustly handles the deletion lifecycle. Note that the cleanup logic itself must be idempotent. If the operator crashes mid-cleanup, the next reconcile will re-run it. The cleanup function should not fail if a resource it's trying to delete is already gone.

    Production-Grade Patterns and Edge Case Handling

    With idempotency and finalizers, our operator is strong. But production environments introduce new challenges. Let's address them.

    Status Subresource Management

    Directly updating the myResource object to add the finalizer or report status can cause race conditions. If another actor (like a user or another controller) updates the object's spec or metadata simultaneously, one of the updates will be rejected by the API server due to an optimistic locking conflict (resourceVersion mismatch). A common anti-pattern is to get the object, modify it, and call r.Update(). This updates the entire object, including spec and status.

    The correct approach is to use the status subresource. It's a separate API endpoint for an object that only allows modifications to the .status field. This isolates status updates from spec updates, dramatically reducing conflicts.

    First, enable it in your CRD definition with //+kubebuilder:subresource:status.

    go
    // MyResourceStatus defines the observed state of MyResource
    type MyResourceStatus struct {
    	Conditions []metav1.Condition `json:"conditions,omitempty"`
    	ActivePods int32              `json:"activePods,omitempty"`
    }

    Code Example 3: Updating Status Safely

    Instead of r.Update(ctx, &myResource), use r.Status().Update(ctx, &myResource).

    go
    func (r *MyResourceReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
        // ... (fetch myResource)
    
        // At the end of the reconcile loop, after acting on resources...
    
        // Create a fresh copy to avoid modifying the cache
        statusResource := myResource.DeepCopy()
    
        // Update the status based on the observed state
        foundDeployment := &appsv1.Deployment{}
        // ... get deployment
    
        statusResource.Status.ActivePods = foundDeployment.Status.ReadyReplicas
    
        // Use Kubernetes Conditions for standardized status reporting
        condition := metav1.Condition{
            Type:    "Ready",
            Status:  metav1.ConditionFalse,
            Reason:  "Reconciling",
            Message: "Deployment and Service are being reconciled",
        }
        if foundDeployment.Status.ReadyReplicas == *myResource.Spec.ReplicaCount {
            condition.Status = metav1.ConditionTrue
            condition.Reason = "Succeeded"
            condition.Message = "All resources are in the desired state"
        }
    
        // This helper from apimeta manages the conditions list
        meta.SetStatusCondition(&statusResource.Status.Conditions, condition)
    
        // Use a deep equal to avoid unnecessary status updates
        if !reflect.DeepEqual(myResource.Status, statusResource.Status) {
            log.Info("Updating status")
            if err := r.Status().Update(ctx, statusResource); err != nil {
                log.Error(err, "Failed to update MyResource status")
                return ctrl.Result{}, err
            }
        }
    
        return ctrl.Result{}, nil
    }

    Error Handling and Requeueing

    controller-runtime's error handling is nuanced. How you return from Reconcile dictates its behavior:

  • return ctrl.Result{}, nil: Success. Requeue will only happen on a watch event or after the default requeueAfter period (if configured).
  • return ctrl.Result{Requeue: true}, nil: Success, but please requeue immediately. Use this when you've just created a resource and want to check its status right away.
  • return ctrl.Result{}, err: An error occurred. controller-runtime will requeue the request with an exponential backoff. This is for transient errors (e.g., a temporary network issue when calling the API server).
  • return ctrl.Result{RequeueAfter: time.Second * 30}, nil: Success, but requeue after a specific duration. Useful for periodic checks that don't depend on watch events.
  • Distinguishing between transient and permanent errors is vital. If an error is permanent (e.g., invalid spec configuration), requeueing forever will just spam logs. You should report the error in the status and return ctrl.Result{}, nil to stop the loop for that CR until it's changed.

    Edge Case: Operator Crash During Cleanup

    This is where the finalizer pattern truly shines. Imagine this sequence:

  • Reconcile is triggered for a deleting CR.
  • cleanupExternalResources begins. It successfully deletes an external database.
    • The operator pod crashes before it can remove the finalizer.

    Upon restart, the controller-runtime will list all MyResource objects and add them to the work queue. It will find our CR still in a Terminating state with the finalizer present. A new Reconcile will be triggered. The cleanupExternalResources function will run again. This is why its idempotency is non-negotiable. An idempotent database deletion call might be DELETE FROM registrations WHERE id = ?. A non-idempotent call would fail if the row is already gone. Your cleanup code must handle NotFound errors gracefully.

    Edge Case: Stuck Finalizer

    What if cleanupExternalResources fails permanently? Perhaps an external API is down for an extended period, or a bug prevents cleanup. The finalizer will remain, and the CR will be stuck in the Terminating state forever. This is a common operational problem.

    Your operator must provide visibility into this state via its status conditions. An admin should be able to see a Condition with Type: Ready, Status: False, Reason: DeletionFailed. In emergencies, an administrator can manually intervene by patching the CR to remove the finalizer:

    bash
    kubectl patch myresource my-sample --type json --patch='[ { "op": "remove", "path": "/metadata/finalizers" } ]'

    This is a last resort, as it may orphan the resources the finalizer was meant to clean up, but it's a necessary escape hatch.

    Performance and Optimization

    In a large cluster, a noisy controller can place significant load on the API server. We can optimize this using predicates.

    Controller Watches and Predicates

    By default, your controller is configured to watch MyResource and any types it owns (like Deployment). Any change to these objects triggers a reconciliation. However, many of these changes are irrelevant. For example, when a Deployment's status is updated by the kubelet with pod readiness, it triggers a reconcile in our operator, which is often unnecessary.

    We can use predicates to filter these events at the source.

    Code Example 4: Filtering Unnecessary Reconciliations

    In your SetupWithManager function, you can specify predicates:

    go
    import (
        "sigs.k8s.io/controller-runtime/pkg/predicate"
        "sigs.k8s.io/controller-runtime/pkg/builder"
    )
    
    func (r *MyResourceReconciler) SetupWithManager(mgr ctrl.Manager) error {
        return ctrl.NewControllerManagedBy(mgr).
            For(&mygroupv1.MyResource{}).
            Owns(&appsv1.Deployment{}).
            Owns(&corev1.Service{}).
            WithEventFilter(predicate.Funcs{
                UpdateFunc: func(e event.UpdateEvent) bool {
                    // Ignore updates to CR status in which case metadata.Generation does not change
                    return e.ObjectOld.GetGeneration() != e.ObjectNew.GetGeneration()
                },
                DeleteFunc: func(e event.DeleteEvent) bool {
                    // Evaluates to false if the object has been confirmed deleted.
                    return !e.DeleteStateUnknown
                },
            }).
            Complete(r)
    }

    This UpdateFunc predicate is particularly powerful. metadata.generation is an integer that is only incremented by the API server when the object's spec changes. Updates to metadata or status do not change the generation. By filtering on this, we completely ignore reconciles triggered by status updates on our primary resource, which is a massive source of noise.

    Conclusion

    Building a Kubernetes Operator that is fit for production requires moving far beyond the initial scaffolding. By implementing an idempotent Observe->Diff->Act reconcile loop, we ensure that our controller reliably converges the system to the desired state. By adding finalizers, we gain full control over the resource lifecycle, enabling complex, graceful cleanup operations that are essential for stateful applications. Finally, by handling edge cases like API conflicts via the status subresource and optimizing our event handling with predicates, we create a controller that is not only correct and resilient but also efficient and well-behaved in a large-scale cluster.

    These advanced patterns are the bedrock of reliable automation on Kubernetes. They transform a simple script into a robust, autonomous system capable of managing the entire lifecycle of complex applications with precision and safety.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles