Idempotent Kubernetes Operators with Finalizers & Controller-Runtime
The Inherent State Problem in a Stateless System
In the world of Kubernetes controllers, the reconciliation loop is king. It's a beautifully simple, stateless concept: observe the desired state (from a Custom Resource, or CR), observe the current state of the world, and make changes to converge the current state towards the desired state. This works flawlessly for managing native Kubernetes resources. However, the moment your operator needs to manage resources outside the Kubernetes API server—a cloud database, a DNS record, a SaaS subscription—this stateless model reveals a critical flaw.
Consider an operator managing a ManagedDatabase CR. When a developer applies the CR manifest, the reconciliation loop triggers. It might call the cloud provider's API to provision a new PostgreSQL instance and then create a Kubernetes Secret with the credentials. So far, so good.
Now, what happens when the developer runs kubectl delete manageddatabase my-prod-db? The Kubernetes API server dutifully removes the ManagedDatabase object. The reconciliation loop for that object will never run again. The Secret might be garbage collected if it has an ownerReference, but the actual PostgreSQL instance in the cloud? It's now an orphan—a costly, unmanaged, and potentially insecure resource left to rot.
This is the fundamental problem that Kubernetes Finalizers solve. They are the hook that allows your controller to interrupt the deletion process, perform necessary cleanup, and then gracefully permit the object to be removed. This article provides a production-focused implementation pattern for using finalizers within a Go-based operator built with controller-runtime, focusing on the critical principles of idempotency and robust error handling.
Prerequisite: The Idempotent Reconciliation Foundation
Before we can even talk about deletion, our core reconciliation logic must be idempotent. A reconciliation function may be called multiple times for the same CR version due to controller restarts, unrelated updates, or periodic re-syncs. If your Reconcile function isn't idempotent, you'll create duplicate resources, trigger unnecessary API calls, and introduce instability.
The core pattern for idempotency is Read -> Compare -> Act. Never assume a resource doesn't exist; always check first.
Let's establish our ManagedDatabase CRD and a non-idempotent vs. idempotent reconciliation snippet.
api/v1/manageddatabase_types.go
package v1
import (
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
// ManagedDatabaseSpec defines the desired state of ManagedDatabase
type ManagedDatabaseSpec struct {
// DBName is the name of the database to be created.
DBName string `json:"dbName"`
// Engine is the database engine, e.g., "postgres" or "mysql".
Engine string `json:"engine"`
// CredentialsSecretName is the name of the K8s Secret to store credentials.
CredentialsSecretName string `json:"credentialsSecretName"`
}
// ManagedDatabaseStatus defines the observed state of ManagedDatabase
type ManagedDatabaseStatus struct {
// Ready indicates if the database is provisioned and ready.
Ready bool `json:"ready"`
// ExternalID is the ID of the database in the external system.
ExternalID string `json:"externalId,omitempty"`
// Message provides human-readable status.
Message string `json:"message,omitempty"`
}
//+kubebuilder:object:root=true
//+kubebuilder:subresource:status
// ManagedDatabase is the Schema for the manageddatabases API
type ManagedDatabase struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`
Spec ManagedDatabaseSpec `json:"spec,omitempty"`
Status ManagedDatabaseStatus `json:"status,omitempty"`
}
//+kubebuilder:object:root=true
// ManagedDatabaseList contains a list of ManagedDatabase
type ManagedDatabaseList struct {
metav1.TypeMeta `json:",inline"`
metav1.ListMeta `json:"metadata,omitempty"`
Items []ManagedDatabase `json:"items"`
}
func init() {
SchemeBuilder.Register(&ManagedDatabase{}, &ManagedDatabaseList{})
}
Now, let's look at the controller logic.
A Naive, Non-Idempotent Approach (DO NOT DO THIS):
// controllers/manageddatabase_controller.go
func (r *ManagedDatabaseReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
// ... boilerplate setup ...
// THIS IS BAD - it will try to create the DB on every reconcile loop!
externalID, err := r.DBProvider.CreateDatabase(ctx, mdb.Spec.DBName)
if err != nil {
// ... error handling ...
}
// It will also try to create the secret every time, failing if it exists.
err = r.createCredentialsSecret(ctx, &mdb, "super-secret-password")
if err != nil {
// ... error handling ...
}
return ctrl.Result{}, nil
}
The Correct, Idempotent Pattern:
// controllers/manageddatabase_controller.go
func (r *ManagedDatabaseReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
log := log.FromContext(ctx)
var mdb mygroup.comv1.ManagedDatabase
if err := r.Get(ctx, req.NamespacedName, &mdb); err != nil {
return ctrl.Result{}, client.IgnoreNotFound(err)
}
// Check if the external database already exists. We use the Status field as our source of truth.
if mdb.Status.ExternalID == "" {
log.Info("Provisioning external database")
externalID, err := r.DBProvider.CreateDatabase(ctx, mdb.Spec.DBName)
if err != nil {
log.Error(err, "Failed to provision external database")
// Update status and requeue with backoff
mdb.Status.Ready = false
mdb.Status.Message = "Failed to provision: " + err.Error()
_ = r.Status().Update(ctx, &mdb)
return ctrl.Result{RequeueAfter: 30 * time.Second}, nil // Don't return error, we'll retry
}
// IMPORTANT: Update the status immediately after a successful creation.
mdb.Status.ExternalID = externalID
mdb.Status.Message = "Database provisioned"
if err := r.Status().Update(ctx, &mdb); err != nil {
return ctrl.Result{}, err // A status update failure is a real problem
}
log.Info("Successfully provisioned external database", "ExternalID", externalID)
}
// Check if the credentials secret exists
secret := &corev1.Secret{}
err := r.Get(ctx, types.NamespacedName{Name: mdb.Spec.CredentialsSecretName, Namespace: mdb.Namespace}, secret)
if err != nil && errors.IsNotFound(err) {
log.Info("Creating credentials secret")
if err := r.createCredentialsSecret(ctx, &mdb, "generated-password"); err != nil {
log.Error(err, "Failed to create credentials secret")
return ctrl.Result{}, err
}
} else if err != nil {
log.Error(err, "Failed to get credentials secret")
return ctrl.Result{}, err
}
// All resources exist, update status to Ready if not already set
if !mdb.Status.Ready {
mdb.Status.Ready = true
mdb.Status.Message = "All resources are ready"
if err := r.Status().Update(ctx, &mdb); err != nil {
return ctrl.Result{}, err
}
}
return ctrl.Result{}, nil
}
This idempotent foundation is non-negotiable. Without it, your finalizer logic will be built on sand.
The Finalizer State Machine: A Production Implementation
Now we introduce the finalizer. A finalizer is simply a string added to the metadata.finalizers array of an object. When you attempt to delete an object with finalizers, the API server does two things:
deletionTimestamp to the object's metadata.The object now exists in a Terminating state. It is the responsibility of the controller that owns the finalizer to perform its cleanup and then remove its finalizer string from the array. Once the finalizers array is empty, the API server completes the deletion.
This turns our Reconcile function into a simple state machine with two primary branches:
deletionTimestamp is non-nil)Let's implement this pattern. We'll define a constant for our finalizer name to avoid magic strings.
// controllers/manageddatabase_controller.go
const managedDatabaseFinalizer = "mygroup.com/finalizer"
Now, we'll rewrite our Reconcile function to incorporate the full finalizer logic.
// controllers/manageddatabase_controller.go
import (
// ... other imports ...
"github.com/go-logr/logr"
"k8s.io/apimachinery/pkg/runtime"
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/client"
"sigs.k8s.io/controller-runtime/pkg/controller/controllerutil"
"sigs.k8s.io/controller-runtime/pkg/log"
mygroupv1 "mygroup.com/api/v1"
)
func (r *ManagedDatabaseReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
log := log.FromContext(ctx)
// 1. Fetch the ManagedDatabase instance
var mdb mygroupv1.ManagedDatabase
if err := r.Get(ctx, req.NamespacedName, &mdb); err != nil {
if errors.IsNotFound(err) {
// Object was deleted, nothing to do. The finalizer logic handles cleanup.
log.Info("ManagedDatabase resource not found. Ignoring since object must be deleted.")
return ctrl.Result{}, nil
}
log.Error(err, "Failed to get ManagedDatabase")
return ctrl.Result{}, err
}
// 2. Check if the object is being deleted
isMarkedForDeletion := mdb.GetDeletionTimestamp() != nil
if isMarkedForDeletion {
if controllerutil.ContainsFinalizer(&mdb, managedDatabaseFinalizer) {
// Our finalizer is present, so let's handle external dependency cleanup.
log.Info("Performing finalizer cleanup for ManagedDatabase")
if err := r.finalizeManagedDatabase(ctx, &mdb, log); err != nil {
// If cleanup fails, we don't remove the finalizer.
// Kubernetes will try again later. This is the core of the resilient pattern.
log.Error(err, "Finalizer cleanup failed. Requeuing.")
return ctrl.Result{Requeue: true}, err
}
// Cleanup was successful. Remove our finalizer.
log.Info("Finalizer cleanup successful. Removing finalizer.")
controllerutil.RemoveFinalizer(&mdb, managedDatabaseFinalizer)
if err := r.Update(ctx, &mdb); err != nil {
return ctrl.Result{}, err
}
}
// Stop reconciliation as the item is being deleted
return ctrl.Result{}, nil
}
// 3. The object is NOT being deleted, so add the finalizer if it doesn't exist.
if !controllerutil.ContainsFinalizer(&mdb, managedDatabaseFinalizer) {
log.Info("Adding finalizer for ManagedDatabase")
controllerutil.AddFinalizer(&mdb, managedDatabaseFinalizer)
if err := r.Update(ctx, &mdb); err != nil {
return ctrl.Result{}, err
}
}
// 4. This is where your normal, idempotent reconciliation logic goes.
// (The code from the previous section)
// ...
// Check if external database exists, create if not...
// Check if secret exists, create if not...
// Update status...
// ...
return ctrl.Result{}, nil
}
// finalizeManagedDatabase contains the logic to clean up external resources.
func (r *ManagedDatabaseReconciler) finalizeManagedDatabase(ctx context.Context, mdb *mygroupv1.ManagedDatabase, log logr.Logger) error {
// IMPORTANT: This cleanup logic MUST be idempotent.
// If the database was already deleted, this should not return an error.
log.Info("Deleting external database", "ExternalID", mdb.Status.ExternalID)
if mdb.Status.ExternalID != "" {
if err := r.DBProvider.DeleteDatabase(ctx, mdb.Status.ExternalID); err != nil {
// Handle specific errors, e.g., if the resource is already gone, that's a success for us.
if IsExternalResourceNotFound(err) {
log.Info("External database already deleted.")
} else {
log.Error(err, "Failed to delete external database")
return err
}
}
}
// Note: We don't need to explicitly delete the Kubernetes Secret here if we set an OwnerReference.
// Kubernetes garbage collection will handle it automatically once the ManagedDatabase is deleted.
// If the secret managed external resources itself, it would need its own finalizer.
log.Info("Successfully finalized ManagedDatabase")
return nil
}
// A helper function to set OwnerReference on created resources
func (r *ManagedDatabaseReconciler) createCredentialsSecret(ctx context.Context, mdb *mygroupv1.ManagedDatabase, password string) error {
secret := &corev1.Secret{
ObjectMeta: metav1.ObjectMeta{
Name: mdb.Spec.CredentialsSecretName,
Namespace: mdb.Namespace,
},
StringData: map[string]string{
"password": password,
},
}
// Set the ManagedDatabase as the owner of the Secret.
// This is crucial for garbage collection.
if err := controllerutil.SetControllerReference(mdb, secret, r.Scheme); err != nil {
return err
}
return r.Create(ctx, secret)
}
This structure is the bedrock of a production-grade operator. It ensures that:
deletionTimestamp, and it retries on failure, preventing the finalizer's removal until cleanup is verifiably complete.Advanced Edge Cases and Production Hardening
Writing the happy path is one thing; building a controller that survives the chaos of a real production environment is another. Let's examine the edge cases.
Edge Case 1: Partial Cleanup Failure
What if your finalize function manages multiple external resources? Imagine it needs to delete a database instance and then a corresponding DNS record.
func (r *ManagedDatabaseReconciler) finalizeManagedDatabase(ctx context.Context, mdb *mygroupv1.ManagedDatabase, log logr.Logger) error {
// Deletes the database successfully
if err := r.DBProvider.DeleteDatabase(ctx, mdb.Status.ExternalID); err != nil {
return err
}
// But fails to delete the DNS record! The API is down.
if err := r.DNSProvider.DeleteRecord(ctx, mdb.Spec.DBName); err != nil {
return err // We return an error, the finalizer remains.
}
return nil
}
On the next reconciliation, the controller will re-run this function. It will try to delete the database again. This is why your cleanup functions must be idempotent. r.DBProvider.DeleteDatabase should return nil (or a recognizable NotFound error) if the database with that ID is already gone. Without this, your controller will get stuck in a permanent failure loop on the first step, never reaching the second.
For complex, multi-step cleanups, consider updating the CR's status to track progress. This makes the state explicit and easier to debug.
// In ManagedDatabaseStatus
type CleanupStatus struct {
DatabaseDeleted bool `json:"databaseDeleted,omitempty"`
DNSRecordDeleted bool `json:"dnsRecordDeleted,omitempty"`
}
// In finalizeManagedDatabase
if !mdb.Status.Cleanup.DatabaseDeleted {
// delete database...
mdb.Status.Cleanup.DatabaseDeleted = true
if err := r.Status().Update(ctx, &mdb); err != nil { return err }
}
if !mdb.Status.Cleanup.DNSRecordDeleted {
// delete dns...
mdb.Status.Cleanup.DNSRecordDeleted = true
if err := r.Status().Update(ctx, &mdb); err != nil { return err }
}
This pattern turns the cleanup into a resumable, idempotent state machine.
Edge Case 2: The Stuck Finalizer
If your finalizer logic has a persistent bug or an external system is permanently unavailable, an object can get stuck in the Terminating state forever. An administrator will see kubectl get manageddatabase showing the object, but kubectl delete will hang.
This is a failure mode you must document for operators of your controller. The manual escape hatch is to patch the object and remove the finalizer directly:
kubectl patch manageddatabase my-stuck-db --type=json -p='[{"op": "remove", "path": "/metadata/finalizers"}]'
This is a dangerous operation. It tells Kubernetes, "I, the human, have manually performed the cleanup, and you can now delete this object." This will orphan the external resources if the cleanup was not actually done.
Edge Case 3: Controller Crash During Finalization
This scenario is where the design shines. Imagine the controller executes r.DBProvider.DeleteDatabase, it succeeds, and then the controller pod crashes before it can remove the finalizer from the ManagedDatabase object.
No problem. When the controller restarts, its informers will sync. It will see a ManagedDatabase object with a deletionTimestamp and its finalizer still present. It will re-enter the Reconcile function, hit the isMarkedForDeletion branch, and call finalizeManagedDatabase again. Because our cleanup function is idempotent, it will see the database is already gone, attempt to delete the DNS record (which might also be gone), and then proceed to remove the finalizer. The system self-heals.
Performance and Requeue Strategy
In our error paths, we used return ctrl.Result{Requeue: true}, err or just return ctrl.Result{}, err. When an error is returned, controller-runtime's manager requeues the item with an exponential backoff by default. This is generally what you want for transient failures.
However, consider a scenario where an external API is rate-limiting you. Immediately requeueing is hostile. It's better to explicitly tell the controller to wait.
// In the reconcile loop, on a rate-limit error
if IsRateLimitError(err) {
log.Info("Hit rate limit, requeueing after 1 minute")
// Return nil error so we don't trigger exponential backoff,
// and instead use our specific delay.
return ctrl.Result{RequeueAfter: 1 * time.Minute}, nil
}
This gives you fine-grained control over the reconciliation frequency, preventing your operator from overwhelming dependent systems during periods of instability.
Furthermore, be mindful of the MaxConcurrentReconciles option when setting up your controller manager. If your finalizer logic involves slow, blocking API calls, a low concurrency might be necessary to avoid exhausting resources or hitting API limits, but it will also slow down the processing of all CRs.
// main.go
func main() {
// ...
mgr, err := ctrl.NewManager(ctrl.GetConfigOrDie(), ctrl.Options{...})
// ...
if err = (&controller.ManagedDatabaseReconciler{
Client: mgr.GetClient(),
Scheme: mgr.GetScheme(),
// ...
}).SetupWithManager(mgr, controller.Options{MaxConcurrentReconciles: 5}); err != nil {
// ...
}
}
Conclusion
The finalizer pattern is not merely a feature; it is the essential mechanism for building operators that can be trusted with the lifecycle of critical, non-Kubernetes resources. By combining a strictly idempotent reconciliation loop with a two-branch state machine driven by the deletionTimestamp, you create a resilient system that can handle transient errors, controller restarts, and partial failures gracefully.
Remember the key principles:
DeletionTimestamp: This is the entry point to your entire create/update vs. delete state machine.finalize function will be called multiple times. It must succeed even if some or all of the cleanup has already been done.Mastering this pattern elevates an operator from a simple automation tool to a robust, production-grade cloud infrastructure manager.