Idempotent Reconciliation with K8s Operator Finalizers
The Deletion Blind Spot: A Race Condition Inherent in Simple Operators
As a senior engineer building on Kubernetes, you've likely moved beyond deploying simple applications and have started extending the Kubernetes API itself with Custom Resource Definitions (CRDs) and controllers—the core of the Operator Pattern. Your initial operator might manage an external resource, like a database in a cloud provider. The reconciliation loop seems straightforward: observe the state of a Custom Resource (CR), and converge the state of the external world to match.
Consider a CloudDatabase CRD. The controller's Reconcile function ensures that for every CloudDatabase object, a corresponding PostgreSQL instance exists in AWS RDS or Google Cloud SQL. If the CR's spec changes (e.g., spec.instanceSize), the controller issues an API call to modify the cloud instance.
The logic is clean:
CloudDatabase object for the current request.- Check if the external database instance exists.
- If not, create it based on the CR's spec.
- If it exists, check if it's in sync with the spec. If not, update it.
This works flawlessly for creation and updates. The critical failure occurs on deletion. When a user executes kubectl delete clouddatabase my-prod-db, the Kubernetes API server marks the object for deletion and the garbage collector swiftly removes it from etcd.
The problem is that this deletion is an asynchronous, out-of-band event for your operator. Your controller might get one last reconciliation request, but by the time it processes it, the object may already be gone. More often, the object is removed before the controller can act. The Reconcile function, which is keyed by the object's name and namespace, will simply receive a NotFound error. It has no object to inspect, and therefore no information to perform a cleanup. The CloudDatabase CR is gone, but the expensive PostgreSQL instance in your cloud account remains, now an untracked, orphaned resource.
This is not a minor edge case; it's a fundamental flaw in naive operator design that leads to resource leaks, security vulnerabilities, and unnecessary costs. To solve this, we need a mechanism to hook into the deletion process, a way to tell Kubernetes: "Do not fully delete this object until my controller has finished its cleanup tasks." This mechanism is the Finalizer.
Finalizers: A Pre-Deletion Hook for Graceful Teardown
A finalizer is not code; it's a piece of metadata. Specifically, it's a list of strings in an object's metadata.finalizers field. When this list is not empty, the Kubernetes API server will not permit the physical deletion of the object from etcd, even if a deletion request has been received.
Instead of immediate deletion, the API server performs a two-stage process:
kubectl delete .... The API server receives the request.metadata.finalizers. If the list is not empty, it updates the object by setting a metadata.deletionTimestamp to the current time. The object now exists in a pre-deletion, read-only state. It is not removed from etcd.deletionTimestamp) triggers a new event, causing the controller to run its Reconcile loop for this object one more time.This is the hook we need. Inside our Reconcile function, we can now detect this state:
if myresource.ObjectMeta.DeletionTimestamp.IsZero() {
// Object is NOT being deleted. This is the normal reconciliation path.
} else {
// Object IS being deleted. This is our cleanup path.
}
Our controller's responsibility is now twofold:
metadata.finalizers list.deletionTimestamp is set): Perform all necessary external cleanup. Once—and only once—the cleanup is verifiably complete, remove our finalizer string from the list and update the object. With our finalizer removed, the metadata.finalizers list might now be empty. The Kubernetes garbage collector, seeing an object with a deletionTimestamp and an empty finalizer list, will proceed with the physical deletion from etcd.
Implementing a Finalizer-Aware Reconciliation Loop
Let's translate this theory into a production-grade Go implementation using the popular controller-runtime library. We'll continue with our CloudDatabase operator example, which manages an external resource.
First, define a unique name for our finalizer. This is crucial to avoid conflicts if multiple controllers operate on the same object.
// controller.go
const cloudDatabaseFinalizer = "database.example.com/finalizer"
Our Reconcile function structure must now explicitly handle the two states: normal operation and deletion.
// controller.go
import (
"context"
"k8s.io/apimachinery/pkg/runtime"
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/client"
"sigs.k8s.io/controller-runtime/pkg/controller/controllerutil"
"sigs.k8s.io/controller-runtime/pkg/log"
databasev1alpha1 "my-operator/api/v1alpha1"
)
// CloudDatabaseReconciler reconciles a CloudDatabase object
type CloudDatabaseReconciler struct {
client.Client
Scheme *runtime.Scheme
// A mock external client for demonstration
ExternalDBClient ExternalClient
}
func (r *CloudDatabaseReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
logger := log.FromContext(ctx)
// 1. Fetch the CloudDatabase instance
instance := &databasev1alpha1.CloudDatabase{}
if err := r.Get(ctx, req.NamespacedName, instance); err != nil {
if client.IgnoreNotFound(err) != nil {
logger.Error(err, "Failed to get CloudDatabase")
return ctrl.Result{}, err
}
// Object not found, probably deleted. Nothing to do.
return ctrl.Result{}, nil
}
// 2. Examine DeletionTimestamp to determine if object is under deletion
if instance.ObjectMeta.DeletionTimestamp.IsZero() {
// The object is not being deleted. Let's add our finalizer if it doesn't exist.
if !controllerutil.ContainsFinalizer(instance, cloudDatabaseFinalizer) {
logger.Info("Adding Finalizer for CloudDatabase")
controllerutil.AddFinalizer(instance, cloudDatabaseFinalizer)
if err := r.Update(ctx, instance); err != nil {
logger.Error(err, "Failed to add finalizer")
return ctrl.Result{}, err
}
}
// This is the main reconciliation logic: create or update the external DB
if err := r.reconcileExternalDatabase(ctx, instance); err != nil {
// Handle reconciliation errors, maybe update status
return ctrl.Result{}, err
}
} else {
// The object is being deleted
if controllerutil.ContainsFinalizer(instance, cloudDatabaseFinalizer) {
logger.Info("Performing cleanup for CloudDatabase")
// Our finalizer is present, so let's perform cleanup
if err := r.cleanupExternalDatabase(ctx, instance); err != nil {
// If cleanup fails, we don't remove the finalizer.
// This ensures we retry cleanup on the next reconciliation.
logger.Error(err, "External resource cleanup failed")
return ctrl.Result{}, err
}
// Cleanup was successful. Remove our finalizer so Kubernetes can delete the object.
logger.Info("External resource cleaned up successfully. Removing finalizer.")
controllerutil.RemoveFinalizer(instance, cloudDatabaseFinalizer)
if err := r.Update(ctx, instance); err != nil {
logger.Error(err, "Failed to remove finalizer")
return ctrl.Result{}, err
}
}
}
return ctrl.Result{}, nil
}
This structure correctly separates the creation/update logic from the deletion/cleanup logic. The key takeaways are:
* Add Finalizer Early: The finalizer is added as the first step in the normal reconciliation path. This ensures that from the moment your operator acknowledges the resource, it's protected from premature deletion.
* Conditional Cleanup: The cleanupExternalDatabase function is only called if both the deletionTimestamp is set AND our specific finalizer is present.
Atomic Cleanup and Finalizer Removal: If cleanup fails, we return an error. controller-runtime will automatically requeue the request. The finalizer is not* removed, preventing the object's deletion and guaranteeing a retry. The finalizer is only removed after the cleanup logic returns successfully.
Advanced Patterns: Idempotency and Edge Case Resilience
The structure above is sound, but production environments are chaotic. We need to make our cleanup logic robust against failures, retries, and manual intervention. This is where idempotency becomes paramount.
An idempotent operation is one that can be applied multiple times without changing the result beyond the initial application. Our cleanupExternalDatabase function must be idempotent.
Consider this scenario:
cleanupExternalDatabase successfully calls the cloud provider's API to delete the database.nil.Reconcile function returns an error.cleanupExternalDatabase again for a database that has already been deleted.A naive implementation might fail on this second attempt:
// NON-IDEMPOTENT, DANGEROUS cleanup logic
func (r *CloudDatabaseReconciler) cleanupExternalDatabase(ctx context.Context, db *databasev1alpha1.CloudDatabase) error {
logger := log.FromContext(ctx)
logger.Info("Deleting external database", "ID", db.Status.ExternalDBID)
// This call will fail if the DB with this ID is already gone.
err := r.ExternalDBClient.Delete(db.Status.ExternalDBID)
if err != nil {
// If the error is anything other than "NotFound", we are in trouble.
return err
}
return nil
}
If r.ExternalDBClient.Delete returns a NotFound error, and we bubble that up, our operator will get stuck in an infinite retry loop trying to delete a non-existent resource. The finalizer will never be removed, and the CR will be stuck in a Terminating state forever.
The correct, idempotent implementation gracefully handles "not found" errors.
// IDEMPOTENT, PRODUCTION-GRADE cleanup logic
func (r *CloudDatabaseReconciler) cleanupExternalDatabase(ctx context.Context, db *databasev1alpha1.CloudDatabase) error {
logger := log.FromContext(ctx)
if db.Status.ExternalDBID == "" {
// If we never stored an external ID, there's nothing to clean up.
logger.Info("No external DB ID found in status. Nothing to clean up.")
return nil
}
logger.Info("Attempting to delete external database", "ID", db.Status.ExternalDBID)
err := r.ExternalDBClient.Delete(db.Status.ExternalDBID)
if err != nil {
// We must check if the error is because the resource is already gone.
if IsExternalResourceNotFound(err) {
logger.Info("External database already deleted. Cleanup is considered successful.")
return nil // This is the key to idempotency!
}
// For any other error (e.g., network issues, permissions), we should retry.
logger.Error(err, "Failed to delete external database", "ID", db.Status.ExternalDBID)
return err
}
logger.Info("Successfully initiated deletion of external database", "ID", db.Status.ExternalDBID)
return nil
}
// IsExternalResourceNotFound is a helper function that inspects the error from the cloud provider's SDK.
// The implementation is specific to the SDK you are using (e.g., checking for a 404 HTTP status code).
func IsExternalResourceNotFound(err error) bool {
// Example for a generic HTTP client:
// type httpError interface { StatusCode() int }
// if he, ok := err.(httpError); ok {
// return he.StatusCode() == 404
// }
// return false
return strings.Contains(err.Error(), "not found") // Simplified for example
}
By treating a NotFound error as a success condition for cleanup, we make the process resilient. It doesn't matter if the resource was deleted by a previous, failed attempt or by a human operator through the cloud console. The result is the same: the external resource is gone, and the finalizer can be safely removed.
Handling Stuck Finalizers
Despite robust logic, you may encounter a situation where a finalizer is "stuck." This typically happens due to a bug in the operator's cleanup logic that prevents it from ever completing successfully and removing the finalizer. When this happens, kubectl delete my-cr will hang indefinitely.
As the operator developer, your first step is to debug the operator logs to understand why the cleanup function is failing. But for a cluster administrator, the immediate need is to unblock the deletion.
This can be done by manually patching the object to remove the finalizer:
kubectl patch clouddatabase my-stuck-db --type json -p='[{"op": "remove", "path": "/metadata/finalizers"}]'
Or, more simply, if you want to remove all finalizers:
kubectl patch clouddatabase my-stuck-db --type merge -p '{"metadata":{"finalizers":[]}}'
This is a powerful but dangerous command. It severs the link between the Kubernetes object and the operator's cleanup process. Performing this will almost certainly orphan the external resource. It should only be used as a last resort when the operator is confirmed to be non-functional.
Performance and API Server Load
The finalizer pattern introduces at least two additional UPDATE operations to the Kubernetes API server for every object's lifecycle:
UPDATE to add the finalizer.UPDATE to remove the finalizer.For operators managing a small number of long-lived resources, this overhead is negligible. However, for an operator that manages thousands of short-lived CRs, this can contribute significantly to the load on etcd and the API server.
In such high-throughput scenarios, consider:
* Predicate Filtering: Use controller-runtime predicates to filter out events that don't require reconciliation, reducing churn. For example, ignore updates where only the metadata or status fields change if your reconciliation logic only depends on the spec.
* Batching: If your external API supports it, consider having the operator batch cleanup operations for multiple CRs marked for deletion, though this adds significant complexity to the controller logic.
Complete Production-Grade Example
Let's put all these concepts together in a more complete, runnable example. We will define the CloudDatabase CRD and a full controller that manages a mock external service.
1. CRD Definition (api/v1alpha1/clouddatabase_types.go)
package v1alpha1
import (
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
// CloudDatabaseSpec defines the desired state of CloudDatabase
type CloudDatabaseSpec struct {
DBName string `json:"dbName"`
InstanceSize string `json:"instanceSize"`
}
// CloudDatabaseStatus defines the observed state of CloudDatabase
type CloudDatabaseStatus struct {
// The unique ID of the database instance in the external system.
ExternalDBID string `json:"externalDbId,omitempty"`
// The current state of the database (e.g., PROVISIONING, READY, DELETING).
State string `json:"state,omitempty"`
}
//+kubebuilder:object:root=true
//+kubebuilder:subresource:status
// CloudDatabase is the Schema for the clouddatabases API
type CloudDatabase struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`
Spec CloudDatabaseSpec `json:"spec,omitempty"`
Status CloudDatabaseStatus `json:"status,omitempty"`
}
//+kubebuilder:object:root=true
// CloudDatabaseList contains a list of CloudDatabase
type CloudDatabaseList struct {
metav1.TypeMeta `json:",inline"`
metav1.ListMeta `json:"metadata,omitempty"`
Items []CloudDatabase `json:"items"`
}
func init() {
SchemeBuilder.Register(&CloudDatabase{}, &CloudDatabaseList{})
}
2. The Full Controller (controllers/clouddatabase_controller.go)
This controller includes status updates and idempotent cleanup logic.
package controllers
import (
"context"
"fmt"
"strings"
"time"
databasev1alpha1 "my-operator/api/v1alpha1"
"k8s.io/apimachinery/pkg/api/errors"
"k8s.io/apimachinery/pkg/runtime"
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/client"
"sigs.k8s.io/controller-runtime/pkg/controller/controllerutil"
"sigs.k8s.io/controller-runtime/pkg/log"
)
const cloudDatabaseFinalizer = "database.example.com/finalizer"
// A mock client to simulate interaction with an external cloud provider
type MockExternalClient struct {
databases map[string]bool // map of ID to existence
}
func (c *MockExternalClient) Create(name string) (string, error) {
// Simulate creation
id := "ext-" + name + "-" + fmt.Sprintf("%d", time.Now().UnixNano())
c.databases[id] = true
return id, nil
}
func (c *MockExternalClient) Delete(id string) error {
if _, ok := c.databases[id]; !ok {
return fmt.Errorf("database with id %s not found", id)
}
delete(c.databases, id)
return nil
}
func IsExternalResourceNotFound(err error) bool {
return strings.Contains(err.Error(), "not found")
}
// CloudDatabaseReconciler reconciles a CloudDatabase object
type CloudDatabaseReconciler struct {
client.Client
Scheme *runtime.Scheme
ExternalClient *MockExternalClient
}
//+kubebuilder:rbac:groups=database.example.com,resources=clouddatabases,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups=database.example.com,resources=clouddatabases/status,verbs=get;update;patch
//+kubebuilder:rbac:groups=database.example.com,resources=clouddatabases/finalizers,verbs=update
func (r *CloudDatabaseReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
logger := log.FromContext(ctx)
instance := &databasev1alpha1.CloudDatabase{}
if err := r.Get(ctx, req.NamespacedName, instance); err != nil {
if errors.IsNotFound(err) {
return ctrl.Result{}, nil
}
return ctrl.Result{}, err
}
if instance.ObjectMeta.DeletionTimestamp.IsZero() {
// NOT DELETING: Add finalizer and reconcile external resource
if !controllerutil.ContainsFinalizer(instance, cloudDatabaseFinalizer) {
logger.Info("Adding Finalizer")
controllerutil.AddFinalizer(instance, cloudDatabaseFinalizer)
if err := r.Update(ctx, instance); err != nil {
return ctrl.Result{}, err
}
}
// Reconcile external database creation/update
if instance.Status.ExternalDBID == "" {
logger.Info("Creating external database")
id, err := r.ExternalClient.Create(instance.Spec.DBName)
if err != nil {
logger.Error(err, "Failed to create external DB")
return ctrl.Result{}, err
}
instance.Status.ExternalDBID = id
instance.Status.State = "READY"
if err := r.Status().Update(ctx, instance); err != nil {
logger.Error(err, "Failed to update CloudDatabase status")
return ctrl.Result{}, err
}
logger.Info("External database created", "ID", id)
}
} else {
// DELETING: Perform cleanup
if controllerutil.ContainsFinalizer(instance, cloudDatabaseFinalizer) {
if err := r.cleanupExternalDatabase(ctx, instance); err != nil {
return ctrl.Result{}, err
}
logger.Info("Removing Finalizer")
controllerutil.RemoveFinalizer(instance, cloudDatabaseFinalizer)
if err := r.Update(ctx, instance); err != nil {
return ctrl.Result{}, err
}
}
}
return ctrl.Result{}, nil
}
func (r *CloudDatabaseReconciler) cleanupExternalDatabase(ctx context.Context, db *databasev1alpha1.CloudDatabase) error {
logger := log.FromContext(ctx, "ID", db.Status.ExternalDBID)
if db.Status.ExternalDBID == "" {
logger.Info("ExternalDBID is empty, nothing to clean up.")
return nil
}
logger.Info("Cleaning up external database")
if err := r.ExternalClient.Delete(db.Status.ExternalDBID); err != nil {
if IsExternalResourceNotFound(err) {
logger.Info("External resource already gone.")
return nil // Idempotent success
}
logger.Error(err, "Cleanup failed")
return err
}
return nil
}
func (r *CloudDatabaseReconciler) SetupWithManager(mgr ctrl.Manager) error {
r.ExternalClient = &MockExternalClient{databases: make(map[string]bool)}
return ctrl.NewControllerManagedBy(mgr).
For(&databasev1alpha1.CloudDatabase{}).
Complete(r)
}
Conclusion: Finalizers as a Core Operator Pattern
The Finalizer pattern is not an optional enhancement; it is a fundamental requirement for any Kubernetes operator that manages resources outside the Kubernetes cluster. Without it, your operator is a ticking time bomb of resource leaks.
By implementing an idempotent, finalizer-aware reconciliation loop, you elevate your controller from a simple automation script to a robust, production-grade system that can gracefully handle the entire lifecycle of its managed resources. You provide a declarative API to your users where kubectl delete works as expected, reliably triggering a safe and complete teardown of all associated infrastructure. Mastering this pattern is a critical step in becoming an effective and responsible Kubernetes platform engineer.