Idempotent Kubernetes Controllers with Finalizers for Stateful CRDs

October 12, 2025

13 min read

Goh Ling Yong

Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Orphaned Resource Problem: A Controller's Blind Spot

As a senior engineer working with Kubernetes, you've likely moved beyond deploying stateless applications and have started extending the Kubernetes API via Custom Resource Definitions (CRDs) to model your domain-specific concepts. The controller pattern, or Operator pattern, is the engine that brings these CRDs to life, reconciling the desired state defined in a Custom Resource (CR) with the actual state of the world.

A common and powerful use case is managing external, stateful resources—a PostgreSQL instance in RDS, a BigQuery dataset, or even a SaaS subscription managed via an API. The controller's reconciliation loop is excellent at creation and updates. A user creates a Database CR, the controller sees it, calls the cloud provider's API to provision the database, and updates the CR's status field with the instance ID and endpoint.

The critical failure point arises during deletion. When a user runs kubectl delete database my-prod-db, the Kubernetes API server simply removes the Database object from etcd. For a stateless pod, this is fine; the Kubelet handles the cleanup. But for our CR, the controller receives a 'delete' event for an object that no longer exists. It has no information—no spec, no status—to know which external resource needs to be deprovisioned. The result is an orphaned, and potentially costly, cloud resource.

This is where the standard reconciliation loop falls short. The object is gone before our custom cleanup logic can run. The solution lies in intercepting the deletion process, which is precisely what finalizers enable.

Finalizers: The Deletion Gatekeeper Mechanism

A finalizer is not a piece of code; it's a piece of data. Specifically, it's a list of strings in the metadata.finalizers field of any Kubernetes object. When this list is non-empty, it acts as a gatekeeper for deletion.

Here's the exact lifecycle:

A user or client issues a DELETE request for an object (e.g., kubectl delete ...).

The API server receives the request. It checks the metadata.finalizers field.

If the finalizers list is not empty, the API server does not delete the object from etcd. Instead, it sets the metadata.deletionTimestamp field to the current time and updates the object.

The object is now in a terminating state. It's still a full-fledged API object, visible via GET requests, but it's marked for deletion.

This update triggers a reconciliation event in any watching controllers. Your controller's Reconcile function is called for the object, but this time it will observe that deletionTimestamp is non-nil.

This is your cue. Your controller must now perform its cleanup logic (e.g., delete the external RDS instance).

After the external resource is successfully deleted, the controller's final responsibility is to remove its finalizer string from the metadata.finalizers list and update the object.

The API server sees another update for the object. This time, the controller has removed its finalizer. If the finalizers list is now empty, the API server proceeds with the actual deletion, removing the object from etcd.

This mechanism guarantees that your controller has a chance to execute cleanup logic before the Kubernetes object, which holds the state and identity of the external resource, is permanently removed.

Architecting the Idempotent Reconciliation State Machine

To build a robust controller, you must think of your Reconcile function as a state machine. For any given CR, it can be in one of two primary states: reconciling (creating/updating) or terminating (cleaning up). The key to resilience is ensuring every step within this state machine is idempotent.

Our controller will manage a CloudDatabase CRD. The logic will be as follows:

Primary State 1: Reconciling (Object deletionTimestamp is nil)

Check for Finalizer: Does the CloudDatabase object have our controller's finalizer (e.g., db.example.com/finalizer) in its metadata.finalizers list?

* If NO: This is likely a newly created object. Add our finalizer to the list and update the object in the API server. Immediately return ctrl.Result{Requeue: true}. This is a critical step. We don't proceed with resource creation until we've successfully registered our interest in its deletion. This ensures we don't create an external resource we can't clean up.

* If YES: Proceed.

Check External Resource: Use the cloud provider's API to check if the database instance (identified, for example, by a tag or name derived from the CR's UID) already exists.

* If NO: Call the cloud API to create the database instance using parameters from the CR's spec. Update the CR's status subresource with the new instance ID, endpoint, and a Phase of Provisioning.

* If YES: The instance exists. This could be a reconciliation after a controller restart or a spec change. Compare the instance's current configuration with the CR's spec. If they differ, issue an update call to the cloud API. Update the CR's status to reflect that it is Ready.

Return: If everything is successful, return ctrl.Result{} with no error to signal the reconciliation is complete for now.

Primary State 2: Terminating (Object deletionTimestamp is not nil)

Check for Finalizer: Does the CloudDatabase object still have our controller's finalizer?

* If NO: This means our cleanup logic has already completed successfully in a previous reconciliation. Another controller might have its own finalizer, but our work is done. Return ctrl.Result{} with no error.

* If YES: Proceed with cleanup.

Delete External Resource: Call the cloud provider's API to delete the database instance. This call must be idempotent. If the API returns a "not found" error, treat it as a success—the desired state (non-existence) has been achieved.

Handle Deletion Errors: If the cloud API returns a transient error (e.g., rate limiting), return an error to trigger a requeue with exponential backoff.

Remove Finalizer: Once the external resource is confirmed to be deleted, remove our finalizer string from the metadata.finalizers list and update the object in the API server.

Return: Return ctrl.Result{} with no error. The Kubernetes API server will now complete the object's deletion.

This two-state approach, with idempotent checks at every step, ensures correctness even if the controller restarts at any point in the process.

Production-Grade Implementation with Go and Kubebuilder

Let's translate this architecture into a production-grade controller using Go and the Kubebuilder framework. We'll manage a CloudDatabase resource.

Step 1: Define the CRD API

First, we define the spec and status for our CRD in api/v1/clouddatabase_types.go.

// api/v1/clouddatabase_types.go
package v1

import (
	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)

// CloudDatabaseSpec defines the desired state of CloudDatabase
type CloudDatabaseSpec struct {
	// Engine specifies the database engine (e.g., "postgres", "mysql").
	// +kubebuilder:validation:Enum=postgres;mysql
	// +kubebuilder:validation:Required
	Engine string `json:"engine"`

	// Size specifies the database instance size (e.g., "small", "medium", "large").
	// +kubebuilder:validation:Enum=small;medium;large
	// +kubebuilder:validation:Required
	Size string `json:"size"`
}

// CloudDatabaseStatus defines the observed state of CloudDatabase
type CloudDatabaseStatus struct {
	// InstanceID is the unique identifier of the database in the cloud provider.
	InstanceID string `json:"instanceId,omitempty"`

	// Phase represents the current state of the database provisioning.
	// +kubebuilder:validation:Enum=Provisioning;Ready;Failed
	Phase string `json:"phase,omitempty"`

	// Conditions represent the latest available observations of an object's state.
	// +optional
	Conditions []metav1.Condition `json:"conditions,omitempty"`
}

//+kubebuilder:object:root=true
//+kubebuilder:subresource:status
//+kubebuilder:printcolumn:name="Engine",type="string",JSONPath=".spec.engine"
//+kubebuilder:printcolumn:name="Size",type="string",JSONPath=".spec.size"
//+kubebuilder:printcolumn:name="Status",type="string",JSONPath=".status.phase"

// CloudDatabase is the Schema for the clouddatabases API
type CloudDatabase struct {
	metav1.TypeMeta   `json:",inline"`
	metav1.ObjectMeta `json:"metadata,omitempty"`

	Spec   CloudDatabaseSpec   `json:"spec,omitempty"`
	Status CloudDatabaseStatus `json:"status,omitempty"`
}

//+kubebuilder:object:root=true

// CloudDatabaseList contains a list of CloudDatabase
type CloudDatabaseList struct {
	metav1.TypeMeta `json:",inline"`
	metav1.ListMeta `json:"metadata,omitempty"`
	Items           []CloudDatabase `json:"items"`
}

func init() {
	SchemeBuilder.Register(&CloudDatabase{}, &CloudDatabaseList{})
}

Step 2: Implement the Controller Logic

Now for the core logic in internal/controller/clouddatabase_controller.go. We'll use a mock external client for demonstration.

// internal/controller/clouddatabase_controller.go
package controller

import (
	"context"
	"fmt"
	"time"

	"github.com/go-logr/logr"
	"k8s.io/apimachinery/pkg/api/errors"
	"k8s.io/apimachinery/pkg/runtime"
	ctrl "sigs.k8s.io/controller-runtime"
	"sigs.k8s.io/controller-runtime/pkg/client"
	"sigs.k8s.io/controller-runtime/pkg/controller/controllerutil"
	"sigs.k8s.io/controller-runtime/pkg/log"

	dbv1 "my-operator/api/v1"
)

const myFinalizerName = "db.example.com/finalizer"

// Mock external client
type MockDBProviderClient struct {}

func (m *MockDBProviderClient) CreateDB(name, engine, size string) (string, error) {
	// In a real implementation, this would call a cloud provider API.
	instanceID := fmt.Sprintf("db-%d", time.Now().UnixNano())
	fmt.Printf("MOCK_CLIENT: Creating DB %s with ID %s\n", name, instanceID)
	return instanceID, nil
}

func (m *MockDBProviderClient) GetDBStatus(instanceID string) (string, error) {
	fmt.Printf("MOCK_CLIENT: Getting status for DB %s\n", instanceID)
	// Simulate checking status
	return "Ready", nil
}

func (m *MockDBProviderClient) DeleteDB(instanceID string) error {
	fmt.Printf("MOCK_CLIENT: Deleting DB %s\n", instanceID)
	// This must be idempotent. If the DB is already gone, don't return an error.
	return nil
}

// CloudDatabaseReconciler reconciles a CloudDatabase object
type CloudDatabaseReconciler struct {
	client.Client
	Scheme *runtime.Scheme
	Log    logr.Logger
	DBClient *MockDBProviderClient // Our external client
}

//+kubebuilder:rbac:groups=db.example.com,resources=clouddatabases,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups=db.example.com,resources=clouddatabases/status,verbs=get;update;patch
//+kubebuilder:rbac:groups=db.example.com,resources=clouddatabases/finalizers,verbs=update

func (r *CloudDatabaseReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
	log := log.FromContext(ctx)

	// Fetch the CloudDatabase instance
	instance := &dbv1.CloudDatabase{}
	err := r.Get(ctx, req.NamespacedName, instance)
	if err != nil {
		if errors.IsNotFound(err) {
			log.Info("CloudDatabase resource not found. Ignoring since object must be deleted.")
			return ctrl.Result{}, nil
		}
		log.Error(err, "Failed to get CloudDatabase")
		return ctrl.Result{}, err
	}

	// Check if the instance is marked to be deleted
	if instance.GetDeletionTimestamp() != nil {
		if controllerutil.ContainsFinalizer(instance, myFinalizerName) {
			// Run finalization logic. If it fails, we'll retry.
			if err := r.finalizeCloudDatabase(ctx, instance); err != nil {
				return ctrl.Result{}, err
			}

			// Remove finalizer. Once all finalizers are removed, the object will be deleted.
			controllerutil.RemoveFinalizer(instance, myFinalizerName)
			if err := r.Update(ctx, instance); err != nil {
				return ctrl.Result{}, err
			}
		}
		return ctrl.Result{}, nil
	}

	// Add finalizer for this CR if it doesn't exist
	if !controllerutil.ContainsFinalizer(instance, myFinalizerName) {
		log.Info("Adding Finalizer for the CloudDatabase")
		controllerutil.AddFinalizer(instance, myFinalizerName)
		if err := r.Update(ctx, instance); err != nil {
			log.Error(err, "Failed to update CloudDatabase with finalizer")
			return ctrl.Result{}, err
		}
        // Requeue because the update will trigger another reconcile
		return ctrl.Result{Requeue: true}, nil
	}

	// Main reconciliation logic
	if instance.Status.InstanceID == "" {
		log.Info("Provisioning new database")
		instanceID, err := r.DBClient.CreateDB(instance.Name, instance.Spec.Engine, instance.Spec.Size)
		if err != nil {
			log.Error(err, "Failed to create external database")
			// Update status to Failed and do not requeue
			instance.Status.Phase = "Failed"
			_ = r.Status().Update(ctx, instance)
			return ctrl.Result{}, err
		}

		instance.Status.InstanceID = instanceID
		instance.Status.Phase = "Provisioning"
		if err := r.Status().Update(ctx, instance); err != nil {
			log.Error(err, "Failed to update CloudDatabase status")
			return ctrl.Result{}, err
		}
		// Requeue to check status later
		return ctrl.Result{RequeueAfter: 15 * time.Second}, nil
	}

	// Check the status of the external resource
	status, err := r.DBClient.GetDBStatus(instance.Status.InstanceID)
	if err != nil {
		log.Error(err, "Failed to get DB status")
		return ctrl.Result{}, err
	}

	if status == "Ready" && instance.Status.Phase != "Ready" {
		log.Info("Database is Ready")
		instance.Status.Phase = "Ready"
		if err := r.Status().Update(ctx, instance); err != nil {
			log.Error(err, "Failed to update status to Ready")
			return ctrl.Result{}, err
		}
	}

	return ctrl.Result{}, nil
}

func (r *CloudDatabaseReconciler) finalizeCloudDatabase(ctx context.Context, db *dbv1.CloudDatabase) error {
	log := log.FromContext(ctx)
	log.Info("Starting finalization for CloudDatabase", "instanceID", db.Status.InstanceID)

	if db.Status.InstanceID == "" {
		log.Info("No external database to finalize, instance ID is empty.")
		return nil
	}

	if err := r.DBClient.DeleteDB(db.Status.InstanceID); err != nil {
		// Here you would check for specific cloud provider errors.
		// If it's a 'NotFound' error, it's a success from our perspective.
		log.Error(err, "Failed to delete the external database")
		return err
	}

	log.Info("Successfully finalized CloudDatabase")
	return nil
}

// SetupWithManager sets up the controller with the Manager.
func (r *CloudDatabaseReconciler) SetupWithManager(mgr ctrl.Manager) error {
	r.DBClient = &MockDBProviderClient{} // Initialize our client
	r.Log = ctrl.Log.WithName("controllers").WithName("CloudDatabase")

	return ctrl.NewControllerManagedBy(mgr).
		For(&dbv1.CloudDatabase{}).
		Complete(r)
}

This implementation directly follows our state machine. Note the distinct paths for when GetDeletionTimestamp() is nil vs. non-nil, and the critical first step of adding the finalizer before any external resource modification occurs.

Advanced Edge Cases and Production Hardening

The code above provides a solid foundation, but production environments introduce complexity. Here are critical edge cases you must consider.

Edge Case 1: The Stuck `Terminating` Object

Scenario: Your controller successfully calls the cloud API to delete the database. Before it can remove the finalizer from the CR, the controller pod crashes and is restarted.

Problem: The CloudDatabase object is now stuck in the Terminating state indefinitely. The external resource is gone, but because the finalizer remains, Kubernetes will not delete the object.

Solution: Idempotency in the finalizeCloudDatabase function is the key. When the controller restarts, it will reconcile the terminating object again. The finalizeCloudDatabase function will be called. It attempts to delete the external database via r.DBClient.DeleteDB(). This call must handle a "not found" scenario gracefully and treat it as a success. If it receives a 404 Not Found from the cloud provider, it should return nil. This allows the reconciliation to proceed to the next step: removing the finalizer. The object is then correctly garbage collected.

Edge Case 2: Finalizer Added, but External Creation Fails

Scenario: The controller adds the finalizer successfully. On the next reconciliation, it attempts to create the external database, but the cloud API returns a persistent error (e.g., InvalidVPCId). The controller sets the status to Failed.

Problem: A user now tries to delete the CloudDatabase CR. The deletion process will hang because the finalizer is present. The finalizeCloudDatabase logic will run, but db.Status.InstanceID will be empty, so it does nothing but successfully remove the finalizer. This works, but it's not intuitive.

Solution: The current implementation handles this correctly. The finalizeCloudDatabase function checks if InstanceID is empty and returns nil, allowing the finalizer to be removed. This is the desired behavior; if no external resource was ever created, there's nothing to clean up.

Edge Case 3: API Race Conditions during Updates

Scenario: Two controllers (e.g., during a rolling update) or a controller and a user try to update the same CloudDatabase object simultaneously. One tries to add a finalizer while the other modifies a label.

Problem: One of the updates will fail with a conflict error because the object's resourceVersion has changed.

Solution: The controller-runtime client handles this transparently. When you call r.Update() or r.Status().Update(), it uses the resourceVersion of the object you fetched. If the update fails due to a conflict, the Reconcile function will return an error. controller-runtime will automatically requeue the request. On the next attempt, the Reconcile function will Get the newer version of the object and retry its logic. For this reason, your reconciliation logic must be completely stateless and derive its actions solely from the state of the object passed into the function.

Performance and Scalability Considerations

* Predicate Functions: To reduce unnecessary reconciliations, use predicate functions in SetupWithManager. For example, you can ignore updates that don't change the object's metadata.generation. This prevents reconciling on status-only changes triggered by your own controller.

    import "sigs.k8s.io/controller-runtime/pkg/predicate"

    // ... in SetupWithManager
    .For(&dbv1.CloudDatabase{}).
    WithEventFilter(predicate.GenerationChangedPredicate{}).
    Complete(r)

* Controller Sharding: If a single controller manages tens of thousands of CRs, the reconciliation queue can become a bottleneck. You can run multiple replicas of your controller and have each one responsible for a subset of CRs. This can be implemented using a label selector specified via command-line flags and passed to the manager's options, but this is an advanced pattern requiring careful planning.

* External API Rate Limiting: Your controller can easily overwhelm an external API. Use a rate-limited HTTP client (e.g., golang.org/x/time/rate) for your external client to avoid being throttled or blocked.

Conclusion

Finalizers are not an optional feature for controllers managing external state; they are a mandatory component of a production-ready system. By treating your reconciliation loop as an idempotent state machine, you can build robust operators that gracefully handle the entire lifecycle of a resource, including the often-overlooked and critical cleanup phase. The pattern of check-and-add finalizer -> create external resource -> perform cleanup -> remove finalizer ensures that you never orphan resources and that your controller can recover from any crash or transient failure, bringing the reliability of Kubernetes' internal control loops to your custom application management.