Idempotent Reconcilers with Finalizers in K8s Operators
The Inevitable Problem: Orphaned Resources
As a seasoned engineer building on Kubernetes, you understand the power of the operator pattern. It extends the Kubernetes API to manage complex, stateful applications and, more importantly, external resources like cloud databases, message queues, or DNS entries. The core of an operator is its reconciliation loop: a continuous process that drives the current state of the world toward the desired state defined in a Custom Resource (CR).
However, a common and costly failure mode arises during deletion. Consider an operator managing CloudDatabase custom resources, where each CR corresponds to an AWS RDS instance. A user creates a CloudDatabase object, the operator's reconciler sees it, and calls the AWS API to provision an RDS instance.
Now, what happens when the user runs kubectl delete clouddatabase my-prod-db?
metadata.deletionTimestamp.- This update triggers the operator's reconciliation loop one last time.
- A naive operator might immediately call the AWS API to terminate the RDS instance.
This seems straightforward, but it's fraught with peril in a distributed system:
Operator Crash: The operator pod could crash or be evicted after the CR is deleted from etcd but before* the AWS API call completes successfully.
* Network Failure: The call to the AWS API might fail due to transient network issues.
* API Rate Limiting: The cloud provider might rate-limit the operator's requests, delaying or preventing deletion.
In all these cases, the CloudDatabase CR is gone from Kubernetes, but the expensive RDS instance is now an orphaned resource, silently accruing costs. The operator has lost its source of truth and has no way to know it needs to clean up this resource. This is where the Finalizer Pattern becomes not just a best practice, but a necessity for robust, production-grade operators.
The Finalizer Pattern: A Deletion Gatekeeper
A finalizer is not a piece of code or a controller; it's simply a string added to the metadata.finalizers list of a Kubernetes object. When this list is not empty, the Kubernetes garbage collector is blocked. An object with a non-empty finalizer list that has been marked for deletion will remain in the API server in a Terminating state indefinitely until its finalizers list is cleared.
This behavior provides the hook we need. Our operator can add its own unique finalizer to any CloudDatabase CR it starts managing. Now, when a user deletes the CR:
deletionTimestamp is set, but the object is not removed from etcd because our finalizer is present.deletionTimestamp and knows the object is being deleted.- It can now perform its cleanup logic (e.g., delete the RDS instance) with confidence.
- With the finalizer list now empty, the Kubernetes garbage collector is unblocked and completes the deletion of the CR object.
This guarantees that the operator has the opportunity to perform and confirm cleanup before its source of truth disappears.
Architecting the Idempotent Reconciliation Loop
Let's build this robust reconciliation loop in Go using the controller-runtime library, the de facto standard for building operators. Our Reconcile function will effectively become a state machine driven by the presence of the deletionTimestamp and our finalizer.
First, let's define our finalizer's name. It should be unique and descriptive, typically using a domain-style name.
// api/v1alpha1/clouddatabase_types.go
package v1alpha1
// ... other imports
const (
CloudDatabaseFinalizer = "database.example.com/finalizer"
)
// ... CRD struct definitions
Our Reconcile function in the controller will be structured around this core logic:
// internal/controller/clouddatabase_controller.go
import (
// ... other imports
"github.com/go-logr/logr"
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/client"
"sigs.k8s.io/controller-runtime/pkg/controller/controllerutil"
databasev1alpha1 "github.com/your-org/your-operator/api/v1alpha1"
)
func (r *CloudDatabaseReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
log := r.Log.WithValues("clouddatabase", req.NamespacedName)
// 1. Fetch the CloudDatabase instance
instance := &databasev1alpha1.CloudDatabase{}
if err := r.Get(ctx, req.NamespacedName, instance); err != nil {
if errors.IsNotFound(err) {
log.Info("CloudDatabase resource not found. Ignoring since object must be deleted.")
return ctrl.Result{}, nil
}
log.Error(err, "Failed to get CloudDatabase")
return ctrl.Result{}, err
}
// 2. The core state machine logic
if instance.GetDeletionTimestamp().IsZero() {
// The object is NOT being deleted, so we proceed with normal reconciliation.
return r.reconcileNormal(ctx, instance, log)
} else {
// The object IS being deleted.
return r.reconcileDelete(ctx, instance, log)
}
}
This structure cleanly separates the creation/update logic from the deletion logic.
State 1: Normal Reconciliation (Create/Update)
When deletionTimestamp is nil, our goal is to ensure the external resource exists and matches the spec.
func (r *CloudDatabaseReconciler) reconcileNormal(ctx context.Context, instance *databasev1alpha1.CloudDatabase, log logr.Logger) (ctrl.Result, error) {
// A. Ensure our finalizer is present on the object.
if !controllerutil.ContainsFinalizer(instance, databasev1alpha1.CloudDatabaseFinalizer) {
log.Info("Adding Finalizer for CloudDatabase")
controllerutil.AddFinalizer(instance, databasev1alpha1.CloudDatabaseFinalizer)
if err := r.Update(ctx, instance); err != nil {
log.Error(err, "Failed to update CloudDatabase to add finalizer")
return ctrl.Result{}, err
}
// Requeue immediately after adding the finalizer to avoid race conditions.
return ctrl.Result{Requeue: true}, nil
}
// B. Check if the external RDS instance exists.
// We use a unique identifier, like `instance.UID`, to name or tag the external resource.
rdsInstance, err := r.RDSClient.DescribeDBInstances(instance.UID)
if err != nil {
if errors.Is(err, rds.ErrDBInstanceNotFound) {
// C. It doesn't exist, so create it.
log.Info("Creating a new RDS instance")
_, createErr := r.RDSClient.CreateDBInstance(instance.Spec.Engine, instance.Spec.Size, instance.UID)
if createErr != nil {
log.Error(createErr, "Failed to create RDS instance")
// Update status to reflect failure
instance.Status.Phase = "Failed"
instance.Status.Message = createErr.Error()
if updateErr := r.Status().Update(ctx, instance); updateErr != nil {
log.Error(updateErr, "Failed to update CloudDatabase status")
}
return ctrl.Result{}, createErr
}
// Creation is asynchronous. We update our status and requeue.
instance.Status.Phase = "Creating"
instance.Status.Message = "RDS instance provisioning has started."
instance.Status.DBInstanceID = string(instance.UID)
if err := r.Status().Update(ctx, instance); err != nil {
log.Error(err, "Failed to update CloudDatabase status")
return ctrl.Result{}, err
}
return ctrl.Result{RequeueAfter: time.Minute * 1}, nil // Requeue to check status later.
}
// Some other AWS API error occurred.
log.Error(err, "Failed to describe RDS instance")
return ctrl.Result{}, err
}
// D. The instance exists. Check for drift and update if necessary.
if rdsInstance.Size != instance.Spec.Size {
log.Info("RDS instance size differs from spec. Updating.", "CurrentSize", rdsInstance.Size, "DesiredSize", instance.Spec.Size)
// ... logic to update RDS instance size ...
// Asynchronous operation, requeue to monitor progress.
return ctrl.Result{RequeueAfter: time.Minute * 2}, nil
}
// E. Update status with current state from AWS.
instance.Status.Phase = rdsInstance.Status
instance.Status.Endpoint = rdsInstance.Endpoint
if err := r.Status().Update(ctx, instance); err != nil {
log.Error(err, "Failed to update CloudDatabase status")
return ctrl.Result{}, err
}
log.Info("Reconciliation complete. External resource is in desired state.")
return ctrl.Result{}, nil
}
Key Production Patterns Here:
Create request but before recording the result, a subsequent reconciliation will find the existing instance and not try to create a duplicate..status subresource of our CR. This is critical as it prevents race conditions where our status update might overwrite a change made by a user to the .spec.CreateDBInstance), update our CR's status to reflect the Creating state, and then requeue with a delay (RequeueAfter). The next reconciliation will poll for the latest status.State 2: Graceful Deletion Logic
This is where the finalizer proves its worth. When deletionTimestamp is set, we execute our cleanup logic.
func (r *CloudDatabaseReconciler) reconcileDelete(ctx context.Context, instance *databasev1alpha1.CloudDatabase, log logr.Logger) (ctrl.Result, error) {
// Check if our finalizer is the one that's blocking deletion.
if controllerutil.ContainsFinalizer(instance, databasev1alpha1.CloudDatabaseFinalizer) {
log.Info("Performing cleanup for CloudDatabase")
// A. Call the external dependency to delete the resource.
if err := r.RDSClient.DeleteDBInstance(instance.UID); err != nil {
// Idempotency check: if the resource is already gone, that's success for us.
if errors.Is(err, rds.ErrDBInstanceNotFound) {
log.Info("External RDS instance already deleted. Proceeding to remove finalizer.")
} else {
// Another error occurred (e.g., API permissions, rate limiting).
log.Error(err, "Failed to delete RDS instance. Requeuing.")
// We must requeue to retry the deletion. The finalizer remains.
return ctrl.Result{}, err
}
}
// B. (Optional but recommended) Poll to confirm deletion.
// Some APIs return success immediately but deletion is async.
// A robust operator confirms the resource is truly gone.
isGone, err := r.RDSClient.ConfirmDBInstanceDeleted(instance.UID)
if err != nil {
log.Error(err, "Error during deletion confirmation polling.")
return ctrl.Result{}, err
}
if !isGone {
log.Info("RDS instance is still terminating. Requeuing to check again.")
return ctrl.Result{RequeueAfter: time.Second * 30}, nil
}
// C. Once external resource is gone, remove the finalizer.
log.Info("External resource deleted successfully. Removing finalizer.")
controllerutil.RemoveFinalizer(instance, databasev1alpha1.CloudDatabaseFinalizer)
if err := r.Update(ctx, instance); err != nil {
log.Error(err, "Failed to remove finalizer from CloudDatabase")
return ctrl.Result{}, err
}
}
// Finalizer is gone, or was never there. The object will be garbage collected.
log.Info("Reconciliation finished for deleted resource.")
return ctrl.Result{}, nil
}
Key Production Patterns Here:
if errors.Is(err, rds.ErrDBInstanceNotFound) handles this perfectly. If the external resource is already gone, we treat it as a success and proceed.Delete API is often not enough. Many cloud services initiate a termination process that can take minutes. A robust operator should poll the external API to confirm the resource no longer exists before removing the finalizer. This prevents a race condition where the CR is deleted but the external resource termination fails later.NotFound, we return an error. controller-runtime will automatically requeue the request with exponential backoff, preventing us from hammering a failing API.Advanced Edge Cases and Production Considerations
Building a simple finalizer loop is one thing; making it production-ready requires anticipating and handling complex failure modes.
Finalizer Stalls
What happens if the external API is permanently unavailable, or a bug in our code prevents the deletion logic from ever succeeding? The finalizer will remain, and the CR will be stuck in a Terminating state forever. This is a finalizer stall.
Mitigation Strategies:
sum(kube_resource_metadata_deletion_timestamp{resource="clouddatabases"}) can be a starting point for an alert. kubectl patch clouddatabase my-stuck-db --type json --patch='[{"op": "remove", "path": "/metadata/finalizers"}]'
Administrators must understand this will likely orphan the external resource, which they will then need to clean up manually.
Terminating state for over 24 hours, the operator could log a critical error, emit a Kubernetes Event, and remove its own finalizer, consciously orphaning the resource to unblock the system. This is a design trade-off between guaranteed cleanup and system availability.Concurrency and Leader Election
In a production environment, you will run multiple replicas of your operator for high availability. controller-runtime handles leader election out of the box, ensuring only one pod is actively reconciling resources at any given time. This prevents two pods from simultaneously trying to create or delete the same RDS instance.
However, your reconciliation logic must still be robust against failovers. If the leader pod dies mid-reconciliation, the new leader will pick up the exact same request. This is why idempotency is not just a feature but a fundamental requirement of the entire Reconcile function. Every step must be repeatable without causing unintended side effects.
Performance and API Server Load
This pattern introduces at least one extra UPDATE call to the Kubernetes API server for every CR created (to add the finalizer). For operators managing thousands of high-frequency resources, this can add load.
Optimization Techniques:
* Batching: While controller-runtime processes items individually, be mindful of the load your operator places on external APIs. Use client-side rate limiting (e.g., using a token bucket algorithm) when calling cloud providers.
* Requeue Tuning: Be judicious with RequeueAfter. Polling every 5 seconds for a resource that takes 10 minutes to provision is wasteful. Use intelligent, longer requeue times. For some operations, you might be able to use an external eventing system (e.g., AWS EventBridge) to trigger reconciliation instead of polling, though this adds significant complexity.
* Controller Caching: Understand how the controller-runtime cache works. By default, your reconciler reads from a local cache, which is eventually consistent with etcd. The Update and Get calls in our example (r.Update, r.Get) are to this cache. The write operations are sent to the API server, and the results are eventually reflected back in the cache. In rare cases of cache lag, a reconciliation might run with slightly stale data, another reason why idempotency is paramount.
Complete Code Example: Tying It All Together
Here is a more complete, runnable Reconcile function that demonstrates the full pattern.
package controller
import (
"context"
"time"
"github.com/go-logr/logr"
"k8s.io/apimachinery/pkg/api/errors"
"k8s.io/apimachinery/pkg/runtime"
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/client"
"sigs.k8s.io/controller-runtime/pkg/controller/controllerutil"
databasev1alpha1 "github.com/your-org/your-operator/api/v1alpha1"
"github.com/your-org/your-operator/internal/rds" // Assume this is a mock client
)
// CloudDatabaseReconciler reconciles a CloudDatabase object
type CloudDatabaseReconciler struct {
client.Client
Log logr.Logger
Scheme *runtime.Scheme
RDSClient rds.Client // Your interface for the external service
}
const cloudDatabaseFinalizer = "database.example.com/finalizer"
func (r *CloudDatabaseReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
log := r.Log.WithValues("clouddatabase", req.NamespacedName)
instance := &databasev1alpha1.CloudDatabase{}
if err := r.Get(ctx, req.NamespacedName, instance); err != nil {
if errors.IsNotFound(err) {
return ctrl.Result{}, nil
}
return ctrl.Result{}, err
}
// Examine DeletionTimestamp to determine if object is under deletion
if instance.ObjectMeta.DeletionTimestamp.IsZero() {
// The object is not being deleted, so if it does not have our finalizer,
// then lets add the finalizer and update the object.
if !controllerutil.ContainsFinalizer(instance, cloudDatabaseFinalizer) {
log.Info("Adding finalizer")
controllerutil.AddFinalizer(instance, cloudDatabaseFinalizer)
if err := r.Update(ctx, instance); err != nil {
return ctrl.Result{}, err
}
return ctrl.Result{Requeue: true}, nil
}
} else {
// The object is being deleted
if controllerutil.ContainsFinalizer(instance, cloudDatabaseFinalizer) {
// Our finalizer is present, so lets handle any external dependency
log.Info("Handling external resource deletion")
if err := r.deleteExternalResources(instance); err != nil {
// if fail to delete the external dependency here, return with error
// so that it can be retried
return ctrl.Result{}, err
}
// Once external dependencies are cleaned up, remove the finalizer.
log.Info("Removing finalizer")
controllerutil.RemoveFinalizer(instance, cloudDatabaseFinalizer)
if err := r.Update(ctx, instance); err != nil {
return ctrl.Result{}, err
}
}
// Stop reconciliation as the item is being deleted
return ctrl.Result{}, nil
}
// Your normal reconciliation logic to create/update the external resource
log.Info("Reconciling CloudDatabase")
externalID := string(instance.UID)
rdsInstance, err := r.RDSClient.Get(externalID)
if err != nil {
if err == rds.ErrNotFound {
log.Info("Creating external resource")
if err := r.RDSClient.Create(externalID, instance.Spec.Size); err != nil {
return ctrl.Result{}, err
}
return ctrl.Result{RequeueAfter: 30 * time.Second}, nil
} else {
return ctrl.Result{}, err
}
}
if rdsInstance.Size != instance.Spec.Size {
log.Info("Updating external resource")
if err := r.RDSClient.Update(externalID, instance.Spec.Size); err != nil {
return ctrl.Result{}, err
}
}
return ctrl.Result{}, nil
}
// deleteExternalResources handles the deletion of the AWS RDS instance.
func (r *CloudDatabaseReconciler) deleteExternalResources(instance *databasev1alpha1.CloudDatabase) error {
externalID := string(instance.UID)
r.Log.Info("Deleting RDS instance", "ID", externalID)
err := r.RDSClient.Delete(externalID)
if err != nil && err != rds.ErrNotFound {
return err
}
r.Log.Info("Successfully deleted or confirmed deletion of RDS instance", "ID", externalID)
return nil
}
// SetupWithManager sets up the controller with the Manager.
func (r *CloudDatabaseReconciler) SetupWithManager(mgr ctrl.Manager) error {
return ctrl.NewControllerManagedBy(mgr).
For(&databasev1alpha1.CloudDatabase{}).
Complete(r)
}
By rigorously applying the finalizer pattern and building for idempotency, you can elevate your Kubernetes operator from a simple automation tool to a resilient, production-grade controller that reliably manages the full lifecycle of its resources, preventing costly leaks and ensuring system stability.