Idempotent Reconcilers in K8s Operators with Finalizers & Conditions

15 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Idempotency Imperative in Reconciliation

In the world of Kubernetes operators, the Reconcile function is the heart of all automation. A common misconception among engineers new to the operator pattern is that this function is a simple one-shot script. The reality is that a Reconcile loop can be triggered dozens of times for a single logical change due to cluster events, cache updates, or unrelated modifications to the Custom Resource (CR). Without a rigorous commitment to idempotency, this repeated execution will lead to catastrophic failures in production.

Idempotency in a Kubernetes controller means that running the Reconcile function N times on the same object state must result in the same desired cluster state, without generating errors or unintended side effects on subsequent runs.

Let's analyze a dangerously naive, non-idempotent implementation for a WebApp operator that's supposed to create a Deployment and a Service.

go
// WARNING: DO NOT USE THIS NON-IDEMPOTENT CODE
func (r *WebAppReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
	log := log.FromContext(ctx)
	webapp := &appv1.WebApp{}
	if err := r.Get(ctx, req.NamespacedName, webapp); err != nil {
		return ctrl.Result{}, client.IgnoreNotFound(err)
	}

	// 1. Create Deployment
	deployment := &appsv1.Deployment{ /* ... deployment spec ... */ }
    ctrl.SetControllerReference(webapp, deployment, r.Scheme)
	if err := r.Create(ctx, deployment); err != nil {
		log.Error(err, "Failed to create Deployment")
		return ctrl.Result{}, err // Requeue on error
	}
	log.Info("Created Deployment")

	// --- CRASH OR REQUEUE HAPPENS HERE ---

	// 2. Create Service
	service := &corev1.Service{ /* ... service spec ... */ }
    ctrl.SetControllerReference(webapp, service, r.Scheme)
	if err := r.Create(ctx, service); err != nil {
		log.Error(err, "Failed to create Service")
		return ctrl.Result{}, err
	}
	log.Info("Created Service")

	return ctrl.Result{}, nil
}

The failure scenario is clear: if the operator process crashes or the reconciliation is preempted after the Deployment is created but before the Service is, the next Reconcile run will attempt to create the Deployment again. This will fail with an AlreadyExists error, and the Service will never be created. The system is now in a permanently broken state.

The Correct, Idempotent Pattern

The robust solution is to adopt a "read-before-write" or "desired state vs. actual state" approach for every resource the operator manages. We check if the resource exists. If it does, we check if it matches our desired specification. If not, we update it. If it doesn't exist, we create it.

Here is the corrected, idempotent logic for creating the Deployment:

go
// Idempotent Deployment reconciliation

deployment := &appsv1.Deployment{
    ObjectMeta: metav1.ObjectMeta{
        Name:      webapp.Name,
        Namespace: webapp.Namespace,
    },
    // ... other metadata
}

// Check if the Deployment already exists
foundDeployment := &appsv1.Deployment{}
err := r.Get(ctx, types.NamespacedName{Name: deployment.Name, Namespace: deployment.Namespace}, foundDeployment)
if err != nil && errors.IsNotFound(err) {
    log.Info("Creating a new Deployment", "Deployment.Namespace", deployment.Namespace, "Deployment.Name", deployment.Name)
    // Define the desired state
    desiredDeployment := r.deploymentForWebApp(webapp) // Helper function to build the spec
    ctrl.SetControllerReference(webapp, desiredDeployment, r.Scheme)
    if err = r.Create(ctx, desiredDeployment); err != nil {
        log.Error(err, "Failed to create new Deployment")
        return ctrl.Result{}, err
    }
    // Creation successful, requeue to check status later
    return ctrl.Result{Requeue: true}, nil
} else if err != nil {
    log.Error(err, "Failed to get Deployment")
    return ctrl.Result{}, err
}

// Deployment exists, ensure its state is what we desire.
// This is a simplified deep comparison. In production, you'd use a more robust method.
desiredSpec := r.deploymentForWebApp(webapp).Spec
if !reflect.DeepEqual(foundDeployment.Spec, desiredSpec) {
    foundDeployment.Spec = desiredSpec
    log.Info("Updating Deployment spec")
    if err = r.Update(ctx, foundDeployment); err != nil {
        log.Error(err, "Failed to update Deployment")
        return ctrl.Result{}, err
    }
}

This pattern is the bedrock of a stable operator. It can be executed any number of times and will always converge the cluster state towards the desired state defined by the CR's spec.

Deep Dive into Finalizers for Graceful Deletion

Idempotency covers resource creation and updates, but what about deletion? When a user runs kubectl delete my-webapp, the WebApp CR is marked for deletion. The Kubernetes garbage collector, observing the ownerReference we set, will proceed to delete the Deployment and Service.

This works for resources within the Kubernetes cluster. But what if our operator provisioned an external resource, like an S3 bucket, a Cloudflare DNS record, or a user in an external database? Kubernetes has no knowledge of these resources, and they will be orphaned, leading to resource leaks and security vulnerabilities.

This is the problem that Finalizers solve. A finalizer is a key in the metadata.finalizers list of an object. When a finalizer is present, a kubectl delete command does not immediately delete the object. Instead, it sets the metadata.deletionTimestamp field to the current time and puts the object into a Terminating state. The object is only physically removed from the API server after its finalizers list is empty.

This gives our operator a hook to perform pre-delete cleanup logic.

Production-Grade Finalizer Implementation

Let's augment our WebAppReconciler to manage an external resource represented by a finalizer. Our CRD spec might now include an S3 bucket name.

Step 1: Define the Finalizer Name

It's a best practice to use a domain-qualified name to avoid collisions with other controllers.

go
const webAppFinalizer = "app.example.com/finalizer"

Step 2: The Reconcile Logic with Finalizer Handling

The Reconcile function must now become a state machine that first checks for the deletion timestamp.

go
func (r *WebAppReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
	log := log.FromContext(ctx)
	webapp := &appv1.WebApp{}
	if err := r.Get(ctx, req.NamespacedName, webapp); err != nil {
		return ctrl.Result{}, client.IgnoreNotFound(err)
	}

	// Check if the instance is being deleted
	isWebAppMarkedToBeDeleted := webapp.GetDeletionTimestamp() != nil
	if isWebAppMarkedToBeDeleted {
		if controllerutil.ContainsFinalizer(webapp, webAppFinalizer) {
			// Run our finalizer logic. If it fails, we return the error which will trigger a requeue.
			if err := r.finalizeWebApp(ctx, webapp); err != nil {
				log.Error(err, "Failed to run finalizer logic")
				// Don't remove the finalizer so we can retry
				return ctrl.Result{}, err
			}

			// Finalizer logic succeeded. Remove the finalizer from the list and update the object.
			// This is a critical step that allows the deletion to proceed.
			log.Info("Successfully finalized WebApp, removing finalizer")
			controllerutil.RemoveFinalizer(webapp, webAppFinalizer)
			if err := r.Update(ctx, webapp); err != nil {
				return ctrl.Result{}, err
			}
		}
		// Stop reconciliation as the item is being deleted
		return ctrl.Result{}, nil
	}

	// The object is not being deleted, so we add our finalizer if it doesn't exist.
	if !controllerutil.ContainsFinalizer(webapp, webAppFinalizer) {
		log.Info("Adding finalizer for WebApp")
		controllerutil.AddFinalizer(webapp, webAppFinalizer)
		if err := r.Update(ctx, webapp); err != nil {
			return ctrl.Result{}, err
		}
	}

	// ... your normal reconciliation logic for creating/updating Deployments, Services etc. goes here ...

	return ctrl.Result{}, nil
}

func (r *WebAppReconciler) finalizeWebApp(ctx context.Context, webapp *appv1.WebApp) error {
	// This is where you would put your cleanup logic for external resources.
	// For example, deleting an S3 bucket, a database user, etc.
	// This function MUST be idempotent.
	log := log.FromContext(ctx)
	log.Info("Performing cleanup for external resources", "bucketName", webapp.Spec.BucketName)

	// Fictional external client
	externalClient, err := external.NewClient()
	if err != nil {
		return err
	}

	// Idempotent delete: If the bucket doesn't exist, this should not return an error.
	if err := externalClient.DeleteBucket(webapp.Spec.BucketName); err != nil {
        // If the error indicates the bucket is already gone, we can ignore it.
        if !external.IsNotFound(err) { 
		    return fmt.Errorf("failed to delete external bucket: %w", err)
        }
	}

	log.Info("External resource cleanup successful")
	return nil
}

Edge Case: The Stuck `Terminating` Object

A common production issue is an object stuck in the Terminating state. This happens when the finalizeWebApp logic consistently fails. The finalizer is never removed, and the object cannot be deleted. Reasons for this include:

  • External API Outage: The cleanup logic cannot reach the external service to delete the resource.
  • Permissions Error: The operator's credentials for the external service have expired or are insufficient.
  • Bugs in Cleanup Logic: The finalizeWebApp function has a bug that causes it to panic or return an error incorrectly.
  • Debugging this requires inspecting the operator's logs for the specific CR. The logs should clearly state why the finalizer is failing. In a critical situation, a cluster administrator might have to manually intervene by editing the CR (kubectl edit webapp my-webapp) and removing the finalizer string from the metadata.finalizers list, but this should be a last resort as it will orphan the external resource.

    Advanced State Management with Conditions

    A simple status field like phase: Ready is inadequate for any non-trivial operator. What does Ready mean? What if the Deployment is ready but the Service is not? What if an external resource failed to provision? A user kubectl describeing the CR gets no useful diagnostic information.

    This is why the Kubernetes API conventions strongly recommend using a status.conditions array for reporting object state. A Condition is a structured object that provides detailed, machine-readable status updates.

    metav1.Condition has the following fields:

    * Type: The type of the condition (e.g., Available, Ready, Degraded).

    * Status: "True", "False", or "Unknown".

    * ObservedGeneration: The metadata.generation of the CR that was observed when this condition was last updated.

    * LastTransitionTime: The timestamp of the last status change.

    * Reason: A short, machine-readable CamelCase reason for the condition's state (e.g., DeploymentReady, ServiceMissing).

    * Message: A human-readable message with more details.

    Implementing a Condition-Based Status Subresource

    First, define the status field in your CRD's Go type.

    go
    // in api/v1/webapp_types.go
    
    // WebAppStatus defines the observed state of WebApp
    type WebAppStatus struct {
    	// Conditions store the status conditions of the WebApp
    	// +operator-sdk:csv:customresourcedefinitions:type=status
    	Conditions []metav1.Condition `json:"conditions,omitempty" patchStrategy:"merge" patchMergeKey:"type" protobuf:"bytes,1,rep,name=conditions"`
    }
    
    // WebApp is the Schema for the webapps API
    type WebApp struct {
    	metav1.TypeMeta   `json:",inline"`
    	metav1.ObjectMeta `json:"metadata,omitempty"`
    
    	Spec   WebAppSpec   `json:"spec,omitempty"`
    	Status WebAppStatus `json:"status,omitempty"`
    }

    Next, you need a robust helper function to manage the conditions list. Simply appending to the list is wrong. You need to find and update existing conditions.

    go
    // In your controller package
    
    // setStatusCondition is a helper function to update a condition in the WebAppStatus.
    func (r *WebAppReconciler) setStatusCondition(ctx context.Context, webapp *appv1.WebApp, conditionType string, status metav1.ConditionStatus, reason, message string) error {
        newCondition := metav1.Condition{
            Type:               conditionType,
            Status:             status,
            ObservedGeneration: webapp.Generation,
            LastTransitionTime: metav1.Now(),
            Reason:             reason,
            Message:            message,
        }
    
        // meta.SetStatusCondition is a helper from k8s.io/apimachinery/pkg/api/meta
        // It correctly finds and updates the condition or adds a new one.
        meta.SetStatusCondition(&webapp.Status.Conditions, newCondition)
    
        // Use the status subresource writer to avoid race conditions.
        return r.Status().Update(ctx, webapp)
    }

    Crucial Point: Always use r.Status().Update() to update the status subresource. Never use r.Update() on the main object, as this can lead to race conditions where you overwrite changes made to the spec by a user while your controller was reconciling the status.

    Tying It All Together: A Production-Grade Reconciler

    Let's combine these patterns into a more complete Reconcile function. This function is a state machine that handles finalizers, idempotently reconciles child resources, and reports detailed status via conditions.

    We'll define two condition types: DeploymentReady and an aggregate Ready condition.

    go
    const (
        ConditionTypeReady           = "Ready"
        ConditionTypeDeploymentReady = "DeploymentReady"
    )
    
    func (r *WebAppReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    	log := log.FromContext(ctx)
    	webapp := &appv1.WebApp{}
    	if err := r.Get(ctx, req.NamespacedName, webapp); err != nil {
    		return ctrl.Result{}, client.IgnoreNotFound(err)
    	}
    
        // Always defer a status update.
        // This ensures that any changes to conditions are persisted at the end of the loop.
        defer func() {
            if err := r.Status().Update(ctx, webapp); err != nil {
                log.Error(err, "Failed to update WebApp status")
            }
        }()
    
    	// Finalizer logic (as shown before)
    	if webapp.GetDeletionTimestamp() != nil {
            // ... handle deletion and finalizer removal ...
    		return ctrl.Result{}, nil
    	}
    	if !controllerutil.ContainsFinalizer(webapp, webAppFinalizer) {
            // ... add finalizer ...
    		return ctrl.Result{Requeue: true}, nil
    	}
    
    	// Reconcile Deployment
    	deployment, err := r.reconcileDeployment(ctx, webapp)
    	if err != nil {
    		meta.SetStatusCondition(&webapp.Status.Conditions, metav1.Condition{
    			Type:    ConditionTypeDeploymentReady,
    			Status:  metav1.ConditionFalse,
    			Reason:  "ReconciliationFailed",
    			Message: fmt.Sprintf("Failed to reconcile Deployment: %v", err),
    		})
            // Also update the aggregate Ready condition
            meta.SetStatusCondition(&webapp.Status.Conditions, metav1.Condition{
    			Type:    ConditionTypeReady,
    			Status:  metav1.ConditionFalse,
    			Reason:  "DeploymentNotReady",
    			Message: "Deployment reconciliation failed",
    		})
    		return ctrl.Result{}, err
    	}
    
        // Check if deployment is actually available
        deploymentReady := deployment.Status.AvailableReplicas == *deployment.Spec.Replicas
        if deploymentReady {
            meta.SetStatusCondition(&webapp.Status.Conditions, metav1.Condition{
    			Type:    ConditionTypeDeploymentReady,
    			Status:  metav1.ConditionTrue,
    			Reason:  "DeploymentAvailable",
    			Message: "Deployment has the desired number of available replicas.",
    		})
        } else {
            meta.SetStatusCondition(&webapp.Status.Conditions, metav1.Condition{
    			Type:    ConditionTypeDeploymentReady,
    			Status:  metav1.ConditionFalse,
    			Reason:  "DeploymentNotAvailable",
    			Message: "Deployment does not have the desired number of available replicas.",
    		})
        }
    
    	// Reconcile other resources like Service, Ingress, etc. following the same pattern...
    
        // After all sub-reconcilers, determine the aggregate Ready condition.
        if meta.IsStatusConditionTrue(webapp.Status.Conditions, ConditionTypeDeploymentReady) /* && other conditions... */ {
            meta.SetStatusCondition(&webapp.Status.Conditions, metav1.Condition{
    			Type:    ConditionTypeReady,
    			Status:  metav1.ConditionTrue,
    			Reason:  "AllComponentsReady",
    			Message: "All components are reconciled and available.",
    		})
        } else {
            meta.SetStatusCondition(&webapp.Status.Conditions, metav1.Condition{
    			Type:    ConditionTypeReady,
    			Status:  metav1.ConditionFalse,
    			Reason:  "ComponentsNotReady",
    			Message: "One or more components are not ready.",
    		})
            // Requeue to check again later if not ready
            return ctrl.Result{RequeueAfter: 30 * time.Second}, nil
        }
    
    	return ctrl.Result{}, nil
    }
    
    // reconcileDeployment contains the idempotent logic for the deployment
    func (r *WebAppReconciler) reconcileDeployment(ctx context.Context, webapp *appv1.WebApp) (*appsv1.Deployment, error) {
        // ... full idempotent logic (get, if not found create, if found and different update) ...
        // returns the reconciled deployment object and any error
        return &appsv1.Deployment{}, nil // Placeholder
    }

    This structure provides a clear, robust, and debuggable reconciliation loop. When a user runs kubectl describe webapp my-webapp, they will now see a detailed list of conditions explaining exactly what the operator has done and the current state of each managed component.

    Performance and Optimization Considerations

    Writing a correct operator is one thing; writing a high-performance one that doesn't overload the Kubernetes API server is another.

    1. Controller Watches and Predicates

    By default, a controller will trigger a reconciliation for any change to the primary resource (WebApp) or any owned secondary resources (Deployment, Service). This is often too noisy. For example, when a Deployment scales up a pod, its status is updated, which triggers a WebApp reconcile. In most cases, this is unnecessary.

    Use predicates to filter these events at the source.

    go
    // In main.go or your controller setup
    
    err = ctrl.NewControllerManagedBy(mgr).
        For(&appv1.WebApp{}).
        Owns(&appsv1.Deployment{}).
        // Use a predicate to ignore status updates on Deployments.
        // We only care if the spec changes (which we control) or if it's deleted.
        WithEventFilter(predicate.Or(predicate.GenerationChangedPredicate{}, predicate.AnnotationChangedPredicate{})).
        Complete(r)

    GenerationChangedPredicate is key here. The metadata.generation of an object is only incremented when its spec changes. This predicate effectively filters out all status-only updates, drastically reducing reconciliation churn.

    2. Intelligent Requeue Strategy

    A naive return ctrl.Result{}, err on every failure is suboptimal. controller-runtime automatically implements exponential backoff for reconciliations that return an error, which is good for transient API server errors.

    However, for conditions you can predict, use RequeueAfter. For example, if a Deployment has just been created, it won't be Available instantly. Instead of returning an error or requeuing immediately, it's better to requeue after a reasonable delay.

    go
    // Inside Reconcile, after creating a resource
    if !isReady(resource.Status) {
        // Don't treat this as an error. We just need to wait.
        return ctrl.Result{RequeueAfter: 15 * time.Second}, nil
    }

    This differentiates between a true error state (e.g., failed to create a resource due to invalid spec) and a transient state (e.g., waiting for a resource to become ready).

    3. Concurrency Control

    The MaxConcurrentReconciles option on the controller manager dictates how many Reconcile loops can run in parallel for a given controller. The default is 1.

    go
    // In main.go
    
    ctrl.NewControllerManagedBy(mgr). 
        // ... 
        WithOptions(controller.Options{MaxConcurrentReconciles: 5}).
        Complete(r)

    Increasing this can improve throughput if you have many CRs to manage. However, be cautious. High concurrency can lead to:

    * API Server Throttling: Too many concurrent requests can get you rate-limited by the API server.

    * External Resource Contention: If your operator interacts with an external system, high concurrency could overwhelm it.

    Race Conditions: While the reconciliation of a single object (namespace/name) is always serialized, multiple concurrent reconciles for different* objects could race for shared, non-namespaced resources.

    Tune this value based on performance testing and the nature of the resources your operator manages. For operators managing heavy external resources, a lower concurrency might be safer.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles