Idempotent Kubernetes Controllers with Finalizers for Stateful CRDs
The Orphaned Resource Problem: A Controller's Blind Spot
As a senior engineer working with Kubernetes, you've likely moved beyond deploying stateless applications and have started extending the Kubernetes API via Custom Resource Definitions (CRDs) to model your domain-specific concepts. The controller pattern, or Operator pattern, is the engine that brings these CRDs to life, reconciling the desired state defined in a Custom Resource (CR) with the actual state of the world.
A common and powerful use case is managing external, stateful resources—a PostgreSQL instance in RDS, a BigQuery dataset, or even a SaaS subscription managed via an API. The controller's reconciliation loop is excellent at creation and updates. A user creates a Database CR, the controller sees it, calls the cloud provider's API to provision the database, and updates the CR's status field with the instance ID and endpoint.
The critical failure point arises during deletion. When a user runs kubectl delete database my-prod-db, the Kubernetes API server simply removes the Database object from etcd. For a stateless pod, this is fine; the Kubelet handles the cleanup. But for our CR, the controller receives a 'delete' event for an object that no longer exists. It has no information—no spec, no status—to know which external resource needs to be deprovisioned. The result is an orphaned, and potentially costly, cloud resource.
This is where the standard reconciliation loop falls short. The object is gone before our custom cleanup logic can run. The solution lies in intercepting the deletion process, which is precisely what finalizers enable.
Finalizers: The Deletion Gatekeeper Mechanism
A finalizer is not a piece of code; it's a piece of data. Specifically, it's a list of strings in the metadata.finalizers field of any Kubernetes object. When this list is non-empty, it acts as a gatekeeper for deletion.
Here's the exact lifecycle:
DELETE request for an object (e.g., kubectl delete ...).metadata.finalizers field.finalizers list is not empty, the API server does not delete the object from etcd. Instead, it sets the metadata.deletionTimestamp field to the current time and updates the object.GET requests, but it's marked for deletion.Reconcile function is called for the object, but this time it will observe that deletionTimestamp is non-nil.- This is your cue. Your controller must now perform its cleanup logic (e.g., delete the external RDS instance).
metadata.finalizers list and update the object.finalizers list is now empty, the API server proceeds with the actual deletion, removing the object from etcd.This mechanism guarantees that your controller has a chance to execute cleanup logic before the Kubernetes object, which holds the state and identity of the external resource, is permanently removed.
Architecting the Idempotent Reconciliation State Machine
To build a robust controller, you must think of your Reconcile function as a state machine. For any given CR, it can be in one of two primary states: reconciling (creating/updating) or terminating (cleaning up). The key to resilience is ensuring every step within this state machine is idempotent.
Our controller will manage a CloudDatabase CRD. The logic will be as follows:
Primary State 1: Reconciling (Object deletionTimestamp is nil)
CloudDatabase object have our controller's finalizer (e.g., db.example.com/finalizer) in its metadata.finalizers list?    *   If NO: This is likely a newly created object. Add our finalizer to the list and update the object in the API server. Immediately return ctrl.Result{Requeue: true}. This is a critical step. We don't proceed with resource creation until we've successfully registered our interest in its deletion. This ensures we don't create an external resource we can't clean up.
* If YES: Proceed.
    *   If NO: Call the cloud API to create the database instance using parameters from the CR's spec. Update the CR's status subresource with the new instance ID, endpoint, and a Phase of Provisioning.
    *   If YES: The instance exists. This could be a reconciliation after a controller restart or a spec change. Compare the instance's current configuration with the CR's spec. If they differ, issue an update call to the cloud API. Update the CR's status to reflect that it is Ready.
ctrl.Result{} with no error to signal the reconciliation is complete for now.Primary State 2: Terminating (Object deletionTimestamp is not nil)
CloudDatabase object still have our controller's finalizer?    *   If NO: This means our cleanup logic has already completed successfully in a previous reconciliation. Another controller might have its own finalizer, but our work is done. Return ctrl.Result{} with no error.
* If YES: Proceed with cleanup.
metadata.finalizers list and update the object in the API server.ctrl.Result{} with no error. The Kubernetes API server will now complete the object's deletion.This two-state approach, with idempotent checks at every step, ensures correctness even if the controller restarts at any point in the process.
Production-Grade Implementation with Go and Kubebuilder
Let's translate this architecture into a production-grade controller using Go and the Kubebuilder framework. We'll manage a CloudDatabase resource.
Step 1: Define the CRD API
First, we define the spec and status for our CRD in api/v1/clouddatabase_types.go.
// api/v1/clouddatabase_types.go
package v1
import (
	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
// CloudDatabaseSpec defines the desired state of CloudDatabase
type CloudDatabaseSpec struct {
	// Engine specifies the database engine (e.g., "postgres", "mysql").
	// +kubebuilder:validation:Enum=postgres;mysql
	// +kubebuilder:validation:Required
	Engine string `json:"engine"`
	// Size specifies the database instance size (e.g., "small", "medium", "large").
	// +kubebuilder:validation:Enum=small;medium;large
	// +kubebuilder:validation:Required
	Size string `json:"size"`
}
// CloudDatabaseStatus defines the observed state of CloudDatabase
type CloudDatabaseStatus struct {
	// InstanceID is the unique identifier of the database in the cloud provider.
	InstanceID string `json:"instanceId,omitempty"`
	// Phase represents the current state of the database provisioning.
	// +kubebuilder:validation:Enum=Provisioning;Ready;Failed
	Phase string `json:"phase,omitempty"`
	// Conditions represent the latest available observations of an object's state.
	// +optional
	Conditions []metav1.Condition `json:"conditions,omitempty"`
}
//+kubebuilder:object:root=true
//+kubebuilder:subresource:status
//+kubebuilder:printcolumn:name="Engine",type="string",JSONPath=".spec.engine"
//+kubebuilder:printcolumn:name="Size",type="string",JSONPath=".spec.size"
//+kubebuilder:printcolumn:name="Status",type="string",JSONPath=".status.phase"
// CloudDatabase is the Schema for the clouddatabases API
type CloudDatabase struct {
	metav1.TypeMeta   `json:",inline"`
	metav1.ObjectMeta `json:"metadata,omitempty"`
	Spec   CloudDatabaseSpec   `json:"spec,omitempty"`
	Status CloudDatabaseStatus `json:"status,omitempty"`
}
//+kubebuilder:object:root=true
// CloudDatabaseList contains a list of CloudDatabase
type CloudDatabaseList struct {
	metav1.TypeMeta `json:",inline"`
	metav1.ListMeta `json:"metadata,omitempty"`
	Items           []CloudDatabase `json:"items"`
}
func init() {
	SchemeBuilder.Register(&CloudDatabase{}, &CloudDatabaseList{})
}Step 2: Implement the Controller Logic
Now for the core logic in internal/controller/clouddatabase_controller.go. We'll use a mock external client for demonstration.
// internal/controller/clouddatabase_controller.go
package controller
import (
	"context"
	"fmt"
	"time"
	"github.com/go-logr/logr"
	"k8s.io/apimachinery/pkg/api/errors"
	"k8s.io/apimachinery/pkg/runtime"
	ctrl "sigs.k8s.io/controller-runtime"
	"sigs.k8s.io/controller-runtime/pkg/client"
	"sigs.k8s.io/controller-runtime/pkg/controller/controllerutil"
	"sigs.k8s.io/controller-runtime/pkg/log"
	dbv1 "my-operator/api/v1"
)
const myFinalizerName = "db.example.com/finalizer"
// Mock external client
type MockDBProviderClient struct {}
func (m *MockDBProviderClient) CreateDB(name, engine, size string) (string, error) {
	// In a real implementation, this would call a cloud provider API.
	instanceID := fmt.Sprintf("db-%d", time.Now().UnixNano())
	fmt.Printf("MOCK_CLIENT: Creating DB %s with ID %s\n", name, instanceID)
	return instanceID, nil
}
func (m *MockDBProviderClient) GetDBStatus(instanceID string) (string, error) {
	fmt.Printf("MOCK_CLIENT: Getting status for DB %s\n", instanceID)
	// Simulate checking status
	return "Ready", nil
}
func (m *MockDBProviderClient) DeleteDB(instanceID string) error {
	fmt.Printf("MOCK_CLIENT: Deleting DB %s\n", instanceID)
	// This must be idempotent. If the DB is already gone, don't return an error.
	return nil
}
// CloudDatabaseReconciler reconciles a CloudDatabase object
type CloudDatabaseReconciler struct {
	client.Client
	Scheme *runtime.Scheme
	Log    logr.Logger
	DBClient *MockDBProviderClient // Our external client
}
//+kubebuilder:rbac:groups=db.example.com,resources=clouddatabases,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups=db.example.com,resources=clouddatabases/status,verbs=get;update;patch
//+kubebuilder:rbac:groups=db.example.com,resources=clouddatabases/finalizers,verbs=update
func (r *CloudDatabaseReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
	log := log.FromContext(ctx)
	// Fetch the CloudDatabase instance
	instance := &dbv1.CloudDatabase{}
	err := r.Get(ctx, req.NamespacedName, instance)
	if err != nil {
		if errors.IsNotFound(err) {
			log.Info("CloudDatabase resource not found. Ignoring since object must be deleted.")
			return ctrl.Result{}, nil
		}
		log.Error(err, "Failed to get CloudDatabase")
		return ctrl.Result{}, err
	}
	// Check if the instance is marked to be deleted
	if instance.GetDeletionTimestamp() != nil {
		if controllerutil.ContainsFinalizer(instance, myFinalizerName) {
			// Run finalization logic. If it fails, we'll retry.
			if err := r.finalizeCloudDatabase(ctx, instance); err != nil {
				return ctrl.Result{}, err
			}
			// Remove finalizer. Once all finalizers are removed, the object will be deleted.
			controllerutil.RemoveFinalizer(instance, myFinalizerName)
			if err := r.Update(ctx, instance); err != nil {
				return ctrl.Result{}, err
			}
		}
		return ctrl.Result{}, nil
	}
	// Add finalizer for this CR if it doesn't exist
	if !controllerutil.ContainsFinalizer(instance, myFinalizerName) {
		log.Info("Adding Finalizer for the CloudDatabase")
		controllerutil.AddFinalizer(instance, myFinalizerName)
		if err := r.Update(ctx, instance); err != nil {
			log.Error(err, "Failed to update CloudDatabase with finalizer")
			return ctrl.Result{}, err
		}
        // Requeue because the update will trigger another reconcile
		return ctrl.Result{Requeue: true}, nil
	}
	// Main reconciliation logic
	if instance.Status.InstanceID == "" {
		log.Info("Provisioning new database")
		instanceID, err := r.DBClient.CreateDB(instance.Name, instance.Spec.Engine, instance.Spec.Size)
		if err != nil {
			log.Error(err, "Failed to create external database")
			// Update status to Failed and do not requeue
			instance.Status.Phase = "Failed"
			_ = r.Status().Update(ctx, instance)
			return ctrl.Result{}, err
		}
		instance.Status.InstanceID = instanceID
		instance.Status.Phase = "Provisioning"
		if err := r.Status().Update(ctx, instance); err != nil {
			log.Error(err, "Failed to update CloudDatabase status")
			return ctrl.Result{}, err
		}
		// Requeue to check status later
		return ctrl.Result{RequeueAfter: 15 * time.Second}, nil
	}
	// Check the status of the external resource
	status, err := r.DBClient.GetDBStatus(instance.Status.InstanceID)
	if err != nil {
		log.Error(err, "Failed to get DB status")
		return ctrl.Result{}, err
	}
	if status == "Ready" && instance.Status.Phase != "Ready" {
		log.Info("Database is Ready")
		instance.Status.Phase = "Ready"
		if err := r.Status().Update(ctx, instance); err != nil {
			log.Error(err, "Failed to update status to Ready")
			return ctrl.Result{}, err
		}
	}
	return ctrl.Result{}, nil
}
func (r *CloudDatabaseReconciler) finalizeCloudDatabase(ctx context.Context, db *dbv1.CloudDatabase) error {
	log := log.FromContext(ctx)
	log.Info("Starting finalization for CloudDatabase", "instanceID", db.Status.InstanceID)
	if db.Status.InstanceID == "" {
		log.Info("No external database to finalize, instance ID is empty.")
		return nil
	}
	if err := r.DBClient.DeleteDB(db.Status.InstanceID); err != nil {
		// Here you would check for specific cloud provider errors.
		// If it's a 'NotFound' error, it's a success from our perspective.
		log.Error(err, "Failed to delete the external database")
		return err
	}
	log.Info("Successfully finalized CloudDatabase")
	return nil
}
// SetupWithManager sets up the controller with the Manager.
func (r *CloudDatabaseReconciler) SetupWithManager(mgr ctrl.Manager) error {
	r.DBClient = &MockDBProviderClient{} // Initialize our client
	r.Log = ctrl.Log.WithName("controllers").WithName("CloudDatabase")
	return ctrl.NewControllerManagedBy(mgr).
		For(&dbv1.CloudDatabase{}).
		Complete(r)
}This implementation directly follows our state machine. Note the distinct paths for when GetDeletionTimestamp() is nil vs. non-nil, and the critical first step of adding the finalizer before any external resource modification occurs.
Advanced Edge Cases and Production Hardening
The code above provides a solid foundation, but production environments introduce complexity. Here are critical edge cases you must consider.
Edge Case 1: The Stuck `Terminating` Object
Scenario: Your controller successfully calls the cloud API to delete the database. Before it can remove the finalizer from the CR, the controller pod crashes and is restarted.
Problem: The CloudDatabase object is now stuck in the Terminating state indefinitely. The external resource is gone, but because the finalizer remains, Kubernetes will not delete the object.
Solution: Idempotency in the finalizeCloudDatabase function is the key. When the controller restarts, it will reconcile the terminating object again. The finalizeCloudDatabase function will be called. It attempts to delete the external database via r.DBClient.DeleteDB(). This call must handle a "not found" scenario gracefully and treat it as a success. If it receives a 404 Not Found from the cloud provider, it should return nil. This allows the reconciliation to proceed to the next step: removing the finalizer. The object is then correctly garbage collected.
Edge Case 2: Finalizer Added, but External Creation Fails
Scenario: The controller adds the finalizer successfully. On the next reconciliation, it attempts to create the external database, but the cloud API returns a persistent error (e.g., InvalidVPCId). The controller sets the status to Failed.
Problem: A user now tries to delete the CloudDatabase CR. The deletion process will hang because the finalizer is present. The finalizeCloudDatabase logic will run, but db.Status.InstanceID will be empty, so it does nothing but successfully remove the finalizer. This works, but it's not intuitive.
Solution: The current implementation handles this correctly. The finalizeCloudDatabase function checks if InstanceID is empty and returns nil, allowing the finalizer to be removed. This is the desired behavior; if no external resource was ever created, there's nothing to clean up.
Edge Case 3: API Race Conditions during Updates
Scenario: Two controllers (e.g., during a rolling update) or a controller and a user try to update the same CloudDatabase object simultaneously. One tries to add a finalizer while the other modifies a label.
Problem: One of the updates will fail with a conflict error because the object's resourceVersion has changed.
Solution: The controller-runtime client handles this transparently. When you call r.Update() or r.Status().Update(), it uses the resourceVersion of the object you fetched. If the update fails due to a conflict, the Reconcile function will return an error. controller-runtime will automatically requeue the request. On the next attempt, the Reconcile function will Get the newer version of the object and retry its logic. For this reason, your reconciliation logic must be completely stateless and derive its actions solely from the state of the object passed into the function.
Performance and Scalability Considerations
*   Predicate Functions: To reduce unnecessary reconciliations, use predicate functions in SetupWithManager. For example, you can ignore updates that don't change the object's metadata.generation. This prevents reconciling on status-only changes triggered by your own controller.
    import "sigs.k8s.io/controller-runtime/pkg/predicate"
    // ... in SetupWithManager
    .For(&dbv1.CloudDatabase{}).
    WithEventFilter(predicate.GenerationChangedPredicate{}).
    Complete(r)* Controller Sharding: If a single controller manages tens of thousands of CRs, the reconciliation queue can become a bottleneck. You can run multiple replicas of your controller and have each one responsible for a subset of CRs. This can be implemented using a label selector specified via command-line flags and passed to the manager's options, but this is an advanced pattern requiring careful planning.
*   External API Rate Limiting: Your controller can easily overwhelm an external API. Use a rate-limited HTTP client (e.g., golang.org/x/time/rate) for your external client to avoid being throttled or blocked.
Conclusion
Finalizers are not an optional feature for controllers managing external state; they are a mandatory component of a production-ready system. By treating your reconciliation loop as an idempotent state machine, you can build robust operators that gracefully handle the entire lifecycle of a resource, including the often-overlooked and critical cleanup phase. The pattern of check-and-add finalizer -> create external resource -> perform cleanup -> remove finalizer ensures that you never orphan resources and that your controller can recover from any crash or transient failure, bringing the reliability of Kubernetes' internal control loops to your custom application management.