Idempotent K8s Operators with Finalizers for Stateful Resource Management
The Inherent Risk of Stateful Management in Kubernetes
As engineers, we've embraced the declarative power of Kubernetes. We define our desired state in a Custom Resource (CR), and an operator works tirelessly to make reality match that declaration. For stateless applications, this model is sublime. But when the resource being managed exists outside the cluster—a managed database on AWS RDS, a Cloudflare DNS record, a Grafana dashboard—the contract becomes fragile.
The core challenge is the asynchronous, decoupled nature of the reconciliation loop and the Kubernetes API server's lifecycle management. Consider a simple ManagedDatabase CR. When a developer executes kubectl delete manageddatabase my-prod-db, the Kubernetes API server immediately removes the object from etcd. If your operator is down, restarting, or experiencing a transient network failure at that exact moment, it will never receive the deletion event. The CR is gone, but the expensive production database it managed is now an orphaned resource, silently accruing costs and becoming a maintenance liability.
This is the fundamental problem that Kubernetes Finalizers solve. They are not a feature to be used lightly; they are a critical mechanism for operators that manage resources with a lifecycle independent of the Kubernetes object model. This article will demonstrate the production-ready pattern for implementing a finalizer-aware, idempotent reconciliation loop to manage stateful external resources reliably.
We will not cover the basics of what an operator is or how to set up kubebuilder. We assume you've built a basic operator before and are now facing the challenge of making it robust enough for production.
The Anatomy of a Deletion Failure
Let's visualize the failure mode in a naive operator without a finalizer.
The CRD Spec:
// api/v1/manageddatabase_spec.go
type ManagedDatabaseSpec struct {
Engine string `json:"engine"`
Version string `json:"version"`
StorageGB int `json:"storageGB"`
}
A Naive Reconciler:
// controllers/manageddatabase_controller.go
func (r *ManagedDatabaseReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
// ... boilerplate to fetch the ManagedDatabase object ...
// If the object is not found, it means it was deleted.
if err != nil {
if errors.IsNotFound(err) {
log.Info("ManagedDatabase resource not found. Assuming it was deleted.")
// PROBLEM: How do we know which external DB to delete?
// We don't have the object's Spec or Status anymore.
// We could try to derive the ID from req.NamespacedName, but it's brittle.
// What if the operator was down when the delete happened? This code never runs.
r.ExternalAPI.DeleteDatabase(derivedID) // <-- Brittle and unreliable
return ctrl.Result{}, nil
}
// ... error handling ...
}
// ... normal reconciliation logic for create/update ...
return ctrl.Result{}, nil
}
This approach has two fatal flaws:
kubectl delete is run, the CR is marked for deletion. The reconciler might get one last event, but if it's busy or down, it misses its window. By the time it's ready, the r.Get() call returns a NotFound error, and the CR's Spec and Status (which might contain the external database ID) are lost forever.IsNotFound block will ever be executed for a given deletion. The operator pod could be evicted, the node could go down, or a network partition could occur. The deletion is a fire-and-forget operation from the operator's perspective.Finalizers: The Deletion Gatekeeper
A finalizer is simply a string added to an object's metadata.finalizers list. It acts as a lock. As long as this list is not empty, the Kubernetes API server will not—and cannot—fully delete the object.
When a user requests deletion of an object with a finalizer:
- The API server sees the finalizer list is non-empty.
metadata.deletionTimestamp field to the current time.- The object is now in a "terminating" state. It still exists in etcd and is visible via the API.
deletionTimestamp) triggers a reconciliation event for the operator.It is now the operator's explicit responsibility to perform its cleanup logic, and only upon successful completion, remove its finalizer from the list. Once the metadata.finalizers list is empty, the API server completes the deletion.
This mechanism transforms a fire-and-forget deletion into a robust, stateful, and observable process.
Implementing a Finalizer-Aware Reconciliation Loop
Let's refactor our controller to correctly use finalizers. We'll build it piece by piece, focusing on the core logic within the Reconcile function.
Step 0: Define the Finalizer and CRD Status
First, define a constant for our finalizer name to avoid magic strings. We also need a robust Status subresource to track the state of the external resource.
// controllers/manageddatabase_controller.go
const managedDatabaseFinalizer = "database.example.com/finalizer"
// api/v1/manageddatabase_types.go
type ManagedDatabaseStatus struct {
// The unique identifier for the external database instance.
DatabaseID string `json:"databaseID,omitempty"`
// The connection endpoint for the database.
Endpoint string `json:"endpoint,omitempty"`
// Represents the latest observed state of the external resource.
// +optional
Conditions []metav1.Condition `json:"conditions,omitempty"`
}
Using the Condition type from metav1 is a standard pattern for exposing detailed, machine-readable status, which is invaluable for debugging and UI integrations.
The Core Reconciliation Logic (Reconcile function)
Our Reconcile function will now have two main branches: one for when the object is being deleted, and one for normal operation (create/update).
// controllers/manageddatabase_controller.go
func (r *ManagedDatabaseReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
log := log.FromContext(ctx)
// 1. Fetch the ManagedDatabase instance
db := &databasev1.ManagedDatabase{}
if err := r.Get(ctx, req.NamespacedName, db); err != nil {
if errors.IsNotFound(err) {
log.Info("ManagedDatabase resource not found. Ignoring since object must be deleted")
return ctrl.Result{}, nil
}
log.Error(err, "Failed to get ManagedDatabase")
return ctrl.Result{}, err
}
// 2. Examine the deletion timestamp to determine if the object is being deleted.
isMarkedForDeletion := db.GetDeletionTimestamp() != nil
if isMarkedForDeletion {
if controllerutil.ContainsFinalizer(db, managedDatabaseFinalizer) {
// Run our finalizer logic. If it fails, we want to retry.
if err := r.reconcileDelete(ctx, db); err != nil {
// Don't remove the finalizer if cleanup fails.
// The reconciler will retry.
return ctrl.Result{}, err
}
// Cleanup was successful. Remove the finalizer.
log.Info("External resource cleaned up successfully, removing finalizer")
controllerutil.RemoveFinalizer(db, managedDatabaseFinalizer)
if err := r.Update(ctx, db); err != nil {
return ctrl.Result{}, err
}
}
// Stop reconciliation as the item is being deleted
return ctrl.Result{}, nil
}
// 3. Add the finalizer for new objects.
if !controllerutil.ContainsFinalizer(db, managedDatabaseFinalizer) {
log.Info("Adding finalizer for ManagedDatabase")
controllerutil.AddFinalizer(db, managedDatabaseFinalizer)
if err := r.Update(ctx, db); err != nil {
return ctrl.Result{}, err
}
// Requeue immediately after adding the finalizer to ensure the next reconcile
// has the updated resource version.
return ctrl.Result{Requeue: true}, nil
}
// 4. Run the main reconciliation logic for create/update.
return r.reconcileNormal(ctx, db)
}
This structure provides a clear separation of concerns:
* Deletion Path (isMarkedForDeletion == true): If the deletion timestamp is set, we only execute our cleanup logic (reconcileDelete). If cleanup succeeds, we remove the finalizer. If it fails, we return an error, causing controller-runtime to requeue the request and retry the cleanup later. The finalizer remains, protecting the object from being fully deleted.
* Finalizer Addition: For any object that isn't being deleted, we first ensure our finalizer is present. If not, we add it and immediately requeue. This is a critical step to avoid a race condition where a create and delete command happen in quick succession before the finalizer is added.
* Normal Path: If the object is not being deleted and the finalizer is present, we proceed with the normal create/update logic (reconcileNormal).
The Deletion Logic (reconcileDelete)
This function is responsible for interacting with the external API to tear down the resource.
// controllers/manageddatabase_controller.go
func (r *ManagedDatabaseReconciler) reconcileDelete(ctx context.Context, db *databasev1.ManagedDatabase) error {
log := log.FromContext(ctx)
// The DatabaseID in the Status is our link to the external resource.
externalID := db.Status.DatabaseID
if externalID == "" {
log.Info("External database ID not found in status, assuming resource was never created or already cleaned up.")
// Nothing to do. The external resource doesn't exist.
return nil
}
log.Info("Deleting external database", "DatabaseID", externalID)
err := r.ExternalAPI.DeleteDatabase(ctx, externalID)
if err != nil {
// A critical aspect: the external API's Delete must be idempotent.
// If the resource is already gone, it should return a 'NotFound' error or success.
if IsExternalResourceNotFound(err) {
log.Info("External database already deleted.")
return nil
}
log.Error(err, "Failed to delete external database", "DatabaseID", externalID)
return err
}
log.Info("Successfully initiated deletion of external database", "DatabaseID", externalID)
return nil
}
Key Production Consideration: The external DeleteDatabase API call must be idempotent. If our operator crashes after successfully calling DeleteDatabase but before removing the finalizer, it will re-run reconcileDelete on restart. Calling delete on an already-deleted resource should not return a fatal error; it should be treated as a success.
Achieving True Idempotency in the Reconciliation Loop
With deletion handled, we now focus on reconcileNormal. Idempotency here means that running this function multiple times for the same CR spec should result in the same external resource state without errors or side effects. The operator should be able to converge to the desired state from any starting point.
// controllers/manageddatabase_controller.go
func (r *ManagedDatabaseReconciler) reconcileNormal(ctx context.Context, db *databasev1.ManagedDatabase) (ctrl.Result, error) {
log := log.FromContext(ctx)
// 1. Check if the external resource exists.
externalDB, err := r.ExternalAPI.GetDatabase(ctx, db.Status.DatabaseID)
if err != nil {
if !IsExternalResourceNotFound(err) {
log.Error(err, "Failed to get external database", "DatabaseID", db.Status.DatabaseID)
// Update status to reflect the error
meta.SetStatusCondition(&db.Status.Conditions, metav1.Condition{
Type: "Ready",
Status: metav1.ConditionFalse,
Reason: "GetFailed",
Message: fmt.Sprintf("Failed to get external DB: %v", err),
})
return ctrl.Result{}, r.Status().Update(ctx, db)
}
// The resource doesn't exist. This is the CREATE path.
log.Info("External database not found, creating a new one.")
newDB, err := r.ExternalAPI.CreateDatabase(ctx, db.Spec.Engine, db.Spec.Version, db.Spec.StorageGB)
if err != nil {
log.Error(err, "Failed to create external database")
meta.SetStatusCondition(&db.Status.Conditions, metav1.Condition{
Type: "Ready",
Status: metav1.ConditionFalse,
Reason: "CreateFailed",
Message: fmt.Sprintf("Failed to create external DB: %v", err),
})
// We update status and return error to retry creation.
_ = r.Status().Update(ctx, db) // Use underscore, as the original error is more important.
return ctrl.Result{}, err
}
// CRITICAL: Immediately update the status with the new ID and endpoint.
db.Status.DatabaseID = newDB.ID
db.Status.Endpoint = newDB.Endpoint
meta.SetStatusCondition(&db.Status.Conditions, metav1.Condition{
Type: "Ready",
Status: metav1.ConditionTrue,
Reason: "Created",
Message: "External database created successfully.",
})
if err := r.Status().Update(ctx, db); err != nil {
log.Error(err, "Failed to update ManagedDatabase status after creation")
return ctrl.Result{}, err
}
log.Info("Successfully created external database", "DatabaseID", newDB.ID)
// Requeue to run the update/check logic in the next loop.
return ctrl.Result{Requeue: true}, nil
}
// 2. The resource exists. This is the UPDATE/NO-OP path.
log.Info("External database found", "DatabaseID", externalDB.ID)
// Compare spec with actual state and update if necessary.
if db.Spec.StorageGB != externalDB.StorageGB {
log.Info("Spec drift detected. Updating external database storage", "spec.storageGB", db.Spec.StorageGB, "actual.storageGB", externalDB.StorageGB)
err := r.ExternalAPI.UpdateDatabaseStorage(ctx, externalDB.ID, db.Spec.StorageGB)
if err != nil {
log.Error(err, "Failed to update external database storage")
meta.SetStatusCondition(&db.Status.Conditions, metav1.Condition{
Type: "Ready",
Status: metav1.ConditionFalse,
Reason: "UpdateFailed",
Message: fmt.Sprintf("Failed to update storage: %v", err),
})
_ = r.Status().Update(ctx, db)
return ctrl.Result{}, err
}
log.Info("Successfully updated external database storage")
// Requeue to ensure we re-evaluate state after the update.
return ctrl.Result{Requeue: true}, nil
}
// 3. No drift detected. This is the NO-OP path.
log.Info("No spec drift detected. Desired state equals actual state.")
// Always ensure the status is up-to-date with the latest observed state.
db.Status.Endpoint = externalDB.Endpoint
meta.SetStatusCondition(&db.Status.Conditions, metav1.Condition{
Type: "Ready",
Status: metav1.ConditionTrue,
Reason: "InSync",
Message: "External database is in sync with the desired state.",
})
if err := r.Status().Update(ctx, db); err != nil {
log.Error(err, "Failed to update ManagedDatabase status")
return ctrl.Result{}, err
}
// Desired state reached. We can reconcile again after a longer interval.
return ctrl.Result{RequeueAfter: time.Minute * 5}, nil
}
Key Idempotency Patterns:
Status.DatabaseID is the crucial link. The create path is only triggered if we fail to GetDatabase using that ID. After a successful creation, we immediately persist the new ID to the status. If this status update fails, the next reconcile will re-enter the create path. The CreateDatabase API must therefore also be idempotent (e.g., if called with the same parameters, it could return the existing DB's ID).Spec and the actual state fetched from the external API. We only call the Update API if there is a detected difference. This prevents unnecessary, and potentially disruptive, API calls on every reconcile.UpdateDatabase(all_fields) calls. Use more granular API functions like UpdateDatabaseStorage. This reduces the blast radius of an update and makes the operator's behavior more predictable.InSync and schedules a less frequent reconciliation (RequeueAfter). This reduces load on both the Kubernetes API server and the external API.Advanced Edge Cases and Production Hardening
Building a truly resilient operator means thinking about what happens when things go wrong.
Case 1: Operator Crash During Cleanup
* Scenario: The operator calls ExternalAPI.DeleteDatabase(), which succeeds. Before it can call r.Update() to remove the finalizer, the operator pod is killed.
* Resilience: When the operator restarts, it will receive a reconcile event for the ManagedDatabase CR. The deletionTimestamp is still set, and the finalizer is still present. The reconcileDelete function is called again. It attempts to delete the external DB using the ID from the status. Because our external API is idempotent, this second delete call will return a NotFound error, which we correctly interpret as success. The operator then proceeds to remove the finalizer, and the cleanup completes correctly.
Case 2: External Resource Deleted Manually (Out-of-Band)
* Scenario: A user with cloud console access manually deletes the database that the operator is managing.
* Resilience: On the next reconcileNormal, the r.ExternalAPI.GetDatabase() call will return a NotFound error. Our logic correctly identifies this as the create path and will proceed to re-create the database, enforcing the desired state declared in the CR. This self-healing capability is a major strength of the operator model. For some resources, you might want to instead set an error condition and halt, but for most infrastructure, self-healing is the desired behavior.
Case 3: Partial Failures during Update
* Scenario: An update requires two external API calls (e.g., resize disk, then apply a security group). The first call succeeds, but the second fails.
* Resilience: The operator should be designed to be resumable. The Status.Conditions field is perfect for this. After the first successful call, you could update the condition to Type: "Updating", Reason: "ResizingDiskComplete". If the second call fails, the operator returns an error. The next reconcile will see the Updating condition and can know to skip the first step and retry the second. This prevents re-running completed steps of a multi-stage operation.
Case 4: Handling Unrecoverable External API Errors
* Scenario: The external API returns a permanent error, such as InvalidParameterValue or InsufficientPermissions.
* Resilience: Constantly retrying these errors will achieve nothing and can cause alert fatigue. The operator should inspect the error type. For transient errors (e.g., 503 Service Unavailable), the default exponential backoff of controller-runtime is appropriate. For permanent errors, the operator should set a terminal condition on the CR status (e.g., Type: "Ready", Status: "False", Reason: "TerminalError") and return ctrl.Result{}, nil (i.e., do not requeue). This stops the reconciliation loop for that object until its Spec is changed by a user, requiring manual intervention to fix the configuration.
Conclusion
The combination of finalizers and an idempotent reconciliation loop is not an optional enhancement for operators managing stateful resources; it is a fundamental requirement for production readiness. Finalizers provide the guarantee that cleanup logic will be executed, transforming deletion from a risky, fire-and-forget action into a reliable, transactional process. Idempotency ensures that the operator can safely and repeatedly converge on the desired state from any condition, be it initial creation, a configuration drift, or recovery from a partial failure.
By structuring the reconciliation logic to explicitly handle deletion, creation, and update paths, and by leveraging the CR's Status as the source of truth for the external resource's identity, we build operators that are resilient, predictable, and capable of safely automating the lifecycle of critical infrastructure.