Kubernetes Operators: Finalizers for Stateful Resource Deletion

12 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Inadequacy of Standard Deletion for Stateful Workloads

In the Kubernetes ecosystem, the declarative API is king. We define our desired state in a Custom Resource (CR), and the operator's controller works tirelessly to make reality match that state. This works beautifully for creation and updates. However, the deletion lifecycle presents a significant challenge for any operator managing resources with stateful dependencies outside the Kubernetes cluster.

Consider an operator that manages a DatabaseCluster CR. When a developer creates a DatabaseCluster object, the operator might provision a StatefulSet for the database pods, a Service for networking, and critically, it might also call an external cloud provider's API to provision a persistent block storage volume and register the new database in a separate monitoring service.

What happens when a developer runs kubectl delete databasecluster my-prod-db? By default, the Kubernetes API server initiates a cascading deletion. The DatabaseCluster object is marked for deletion, and its owned resources within Kubernetes (like the StatefulSet and Service) are garbage collected. The problem is that Kubernetes has no knowledge of the external block storage volume or the monitoring service entry. The operator's controller, which holds the logic for managing these external resources, loses its trigger—the DatabaseCluster object—the moment it's deleted from etcd. The result is an orphaned, and potentially costly, cloud resource and stale data in your monitoring system.

This is where finalizers become an indispensable tool in the operator developer's arsenal. A finalizer is a namespaced key in an object's metadata that tells the Kubernetes API server to block the physical deletion of a resource until that specific key is removed. It's a pre-deletion hook that allows our controller to execute complex, stateful cleanup logic before allowing the resource to be removed from the API.

This article will walk through the production-grade implementation of a finalizer within a custom Go operator built with controller-runtime. We will build an operator for a MonitoredStatefulSet that not only manages a StatefulSet but also an external, simulated monitoring service entry, ensuring it's gracefully deregistered upon deletion.

Architecting the `MonitoredStatefulSet` Operator

Our goal is to create a controller that ensures an external resource is always in sync with its corresponding Kubernetes CR, especially during deletion. Let's start by defining our MonitoredStatefulSet CRD.

1. The Custom Resource Definition (CRD)

The CRD defines the API for our custom resource. The spec describes the desired state, and the status reflects the observed state.

go
// api/v1alpha1/monitoredstatefulset_types.go

package v1alpha1

import (
	appsv1 "k8s.io/api/apps/v1"
	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)

// MonitoredStatefulSetSpec defines the desired state of MonitoredStatefulSet
type MonitoredStatefulSetSpec struct {
	// StatefulSetSpec is the spec for the StatefulSet that this resource will manage.
	// +kubebuilder:validation:Required
	StatefulSetSpec appsv1.StatefulSetSpec `json:"statefulSetSpec"`

	// MonitorEndpoint is the URL of the external monitoring service.
	// +kubebuilder:validation:Required
	MonitorEndpoint string `json:"monitorEndpoint"`
}

// MonitoredStatefulSetStatus defines the observed state of MonitoredStatefulSet
type MonitoredStatefulSetStatus struct {
	// Conditions represent the latest available observations of an object's state.
	Conditions []metav1.Condition `json:"conditions,omitempty"`

	// ExternalMonitorID is the ID assigned by the external monitoring service.
	ExternalMonitorID string `json:"externalMonitorID,omitempty"`
}

//+kubebuilder:object:root=true
//+kubebuilder:subresource:status

// MonitoredStatefulSet is the Schema for the monitoredstatefulsets API
type MonitoredStatefulSet struct {
	metav1.TypeMeta   `json:",inline"`
	metav1.ObjectMeta `json:"metadata,omitempty"`

	Spec   MonitoredStatefulSetSpec   `json:"spec,omitempty"`
	Status MonitoredStatefulSetStatus `json:"status,omitempty"`
}

//+kubebuilder:object:root=true

// MonitoredStatefulSetList contains a list of MonitoredStatefulSet
type MonitoredStatefulSetList struct {
	metav1.TypeMeta `json:",inline"`
	metav1.ListMeta `json:"metadata,omitempty"`
	Items           []MonitoredStatefulSet `json:"items"`
}

func init() {
	SchemeBuilder.Register(&MonitoredStatefulSet{}, &MonitoredStatefulSetList{})
}

Our spec embeds a standard StatefulSetSpec and adds a MonitorEndpoint. The status will hold conditions and the ExternalMonitorID we get back from the external service.

2. The Reconciliation Loop Structure

The core of the operator is the Reconcile method. Its fundamental structure must now account for two distinct paths: the normal reconciliation path (creation/updates) and the deletion path (when a finalizer is present and deletion is requested).

go
// controllers/monitoredstatefulset_controller.go

const finalizerName = "statefulsets.example.com/finalizer"

func (r *MonitoredStatefulSetReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
	log := log.FromContext(ctx)

	// 1. Fetch the MonitoredStatefulSet instance
	mss := &appv1alpha1.MonitoredStatefulSet{}
	if err := r.Get(ctx, req.NamespacedName, mss); err != nil {
		if errors.IsNotFound(err) {
			log.Info("MonitoredStatefulSet resource not found. Ignoring since object must be deleted")
			return ctrl.Result{}, nil
		}
		log.Error(err, "Failed to get MonitoredStatefulSet")
		return ctrl.Result{}, err
	}

	// 2. Check if the object is being deleted
	isMarkedForDeletion := mss.GetDeletionTimestamp() != nil
	if isMarkedForDeletion {
		if controllerutil.ContainsString(mss.GetFinalizers(), finalizerName) {
			// Run our finalizer logic
			if err := r.handleFinalizer(ctx, mss); err != nil {
				// Don't remove the finalizer if cleanup fails, so we can retry.
				return ctrl.Result{}, err
			}

			// Cleanup was successful, remove the finalizer
			controllerutil.RemoveString(mss.GetFinalizers(), finalizerName)
			if err := r.Update(ctx, mss); err != nil {
				return ctrl.Result{}, err
			}
		}
		// Stop reconciliation as the item is being deleted
		return ctrl.Result{}, nil
	}

	// 3. Add the finalizer for this CR if it doesn't have one
	if !controllerutil.ContainsString(mss.GetFinalizers(), finalizerName) {
		log.Info("Adding finalizer for MonitoredStatefulSet")
		controllerutil.AddFinalizer(mss, finalizerName)
		if err := r.Update(ctx, mss); err != nil {
			return ctrl.Result{}, err
		}
	}

	// 4. Run the main reconciliation logic for create/update
	return r.handleReconciliation(ctx, mss)
}

This structure is critical:

  • We fetch the instance.
  • We immediately check GetDeletionTimestamp(). If it's non-nil, the user has requested deletion. We then check if our finalizer is still present.
  • If the finalizer is present, we execute our cleanup logic (handleFinalizer). If cleanup succeeds, we remove the finalizer and update the object. This signals to Kubernetes that it can now proceed with the deletion.
  • If the object is not being deleted, we ensure our finalizer is present. This is a crucial step. The finalizer must be added before we create any external resources. Otherwise, a race condition could occur where the CR is created, our operator creates the external resource, and then the user deletes the CR before the operator can add the finalizer.
  • Finally, we proceed with the normal reconciliation logic (handleReconciliation).
  • Deep Dive: Implementing the Finalizer Logic

    Let's implement the handleFinalizer and handleReconciliation methods. We'll need a mock client for our external monitoring service.

    1. A Mock External Service Client

    For a realistic example, let's define a simple client that simulates talking to an external monitoring API.

    go
    // internal/monitor/client.go
    package monitor
    
    import (
    	"bytes"
    	"context"
    	"encoding/json"
    	"fmt"
    	"net/http"
    	"time"
    )
    
    // Mock client for a monitoring service
    type Client struct {
    	Endpoint   string
    	HTTPClient *http.Client
    }
    
    func NewClient(endpoint string) *Client {
    	return &Client{
    		Endpoint:   endpoint,
    		HTTPClient: &http.Client{Timeout: 5 * time.Second},
    	}
    }
    
    type RegisterPayload struct {
    	ServiceName string `json:"serviceName"`
    	Namespace   string `json:"namespace"`
    }
    
    type RegisterResponse struct {
    	MonitorID string `json:"monitorID"`
    }
    
    func (c *Client) Register(ctx context.Context, name, namespace string) (string, error) {
    	// In a real implementation, this would make an HTTP POST request
    	// For this example, we simulate success and return a generated ID.
    	fmt.Printf("SIMULATING: Registering %s/%s with monitoring service at %s\n", namespace, name, c.Endpoint)
    	// Simulate network latency
    	time.Sleep(100 * time.Millisecond)
    	return fmt.Sprintf("mon-%s-%s", namespace, name), nil
    }
    
    func (c *Client) Deregister(ctx context.Context, monitorID string) error {
    	// In a real implementation, this would make an HTTP DELETE request
    	// For this example, we simulate success.
    	fmt.Printf("SIMULATING: Deregistering monitor ID %s from monitoring service\n", monitorID)
    	// Simulate network latency
    	time.Sleep(150 * time.Millisecond)
    
        // To test idempotency, we could simulate a 'not found' error if called twice.
        // if monitorID is already deleted { return nil }
    
    	return nil
    }

    2. The Main Reconciliation Logic (`handleReconciliation`)

    This function handles the creation and updates of both the Kubernetes StatefulSet and the external monitor.

    go
    // controllers/monitoredstatefulset_controller.go
    
    func (r *MonitoredStatefulSetReconciler) handleReconciliation(ctx context.Context, mss *appv1alpha1.MonitoredStatefulSet) (ctrl.Result, error) {
    	log := log.FromContext(ctx)
    
    	// === 1. Reconcile the StatefulSet ===
    	sts := &appsv1.StatefulSet{}
    	err := r.Get(ctx, types.NamespacedName{Name: mss.Name, Namespace: mss.Namespace}, sts)
    	if err != nil && errors.IsNotFound(err) {
    		log.Info("Creating a new StatefulSet")
    		sts = r.statefulSetForMSS(mss)
    		if err := ctrl.SetControllerReference(mss, sts, r.Scheme); err != nil {
    			return ctrl.Result{}, err
    		}
    		if err := r.Create(ctx, sts); err != nil {
    			log.Error(err, "Failed to create new StatefulSet")
    			return ctrl.Result{}, err
    		}
    		// StatefulSet created successfully, requeue to check status
    		return ctrl.Result{Requeue: true}, nil
    	} else if err != nil {
    		log.Error(err, "Failed to get StatefulSet")
    		return ctrl.Result{}, err
    	}
    
    	// === 2. Reconcile the External Monitor ===
    	// If ExternalMonitorID is not set in status, we need to register it.
    	if mss.Status.ExternalMonitorID == "" {
    		log.Info("Registering with external monitoring service")
    		monitorClient := monitor.NewClient(mss.Spec.MonitorEndpoint)
    		monitorID, err := monitorClient.Register(ctx, mss.Name, mss.Namespace)
    		if err != nil {
    			log.Error(err, "Failed to register with monitoring service")
    			// Update status with a condition
    			meta.SetStatusCondition(&mss.Status.Conditions, metav1.Condition{
    				Type:    "MonitorRegistered",
    				Status:  metav1.ConditionFalse,
    				Reason:  "RegistrationFailed",
    				Message: err.Error(),
    			})
    			if updateErr := r.Status().Update(ctx, mss); updateErr != nil {
    				return ctrl.Result{}, updateErr
    			}
    			return ctrl.Result{}, err
    		}
    
    		// Registration successful, update status
    		mss.Status.ExternalMonitorID = monitorID
    		meta.SetStatusCondition(&mss.Status.Conditions, metav1.Condition{
    			Type:   "MonitorRegistered",
    			Status: metav1.ConditionTrue,
    			Reason: "RegistrationSuccessful",
    		})
    		if err := r.Status().Update(ctx, mss); err != nil {
    			return ctrl.Result{}, err
    		}
    		log.Info("Successfully registered with monitoring service", "MonitorID", monitorID)
    	}
    
    	return ctrl.Result{}, nil
    }
    
    // Helper function to construct the StatefulSet
    func (r *MonitoredStatefulSetReconciler) statefulSetForMSS(mss *appv1alpha1.MonitoredStatefulSet) *appsv1.StatefulSet {
    	// Logic to create the StatefulSet object from the spec
    	sts := &appsv1.StatefulSet{
    		ObjectMeta: metav1.ObjectMeta{
    			Name:      mss.Name,
    			Namespace: mss.Namespace,
    		},
    		Spec: mss.Spec.StatefulSetSpec,
    	}
    	return sts
    }

    This logic is idempotent. If the StatefulSet exists, it does nothing. If the ExternalMonitorID is already in the status, it skips the registration. This prevents creating duplicate resources if the reconciliation loop runs multiple times.

    3. The Finalizer Cleanup Logic (`handleFinalizer`)

    This is the most critical piece. This code only runs when the object is marked for deletion.

    go
    // controllers/monitoredstatefulset_controller.go
    
    func (r *MonitoredStatefulSetReconciler) handleFinalizer(ctx context.Context, mss *appv1alpha1.MonitoredStatefulSet) error {
    	log := log.FromContext(ctx)
    
    	if mss.Status.ExternalMonitorID != "" {
    		log.Info("Performing finalizer cleanup: deregistering from monitoring service", "MonitorID", mss.Status.ExternalMonitorID)
    		monitorClient := monitor.NewClient(mss.Spec.MonitorEndpoint)
    		if err := monitorClient.Deregister(ctx, mss.Status.ExternalMonitorID); err != nil {
    			// Here, you might want to check for specific errors. 
    			// If the error is a 'NotFound' error, it means the resource is already gone,
    			// so we can consider the cleanup successful.
    			log.Error(err, "Failed to deregister from monitoring service during finalization")
    			return err
    		}
    	}
    
    	log.Info("External resources cleaned up successfully. Finalizer can be removed.")
    	return nil
    }

    This function reads the ExternalMonitorID from the object's status and calls the Deregister method on our client. If this call fails, the function returns an error. As seen in our main Reconcile loop, returning an error here prevents the finalizer from being removed, and the reconciliation will be retried.

    Advanced Patterns and Edge Case Handling

    Writing robust operators requires thinking beyond the happy path. Finalizers introduce their own set of edge cases.

    1. Idempotency in Deletion

    What happens if the operator pod crashes right after Deregister succeeds but before the r.Update(ctx, mss) call removes the finalizer? When the operator restarts, it will receive the event for the deleting object again and re-run handleFinalizer.

    Our external service client must be idempotent. A DELETE request to an already-deleted resource should not return an error. It should return a success (e.g., 204 No Content) or a 'not found' (e.g., 404 Not Found). Our Deregister function should handle the 404 case as a success, ensuring that a retry doesn't fail and block the deletion indefinitely.

    2. Handling a Stuck Finalizer

    A common production issue is a finalizer that can't be removed because the cleanup logic is perpetually failing (e.g., the external API is down, or a bug in the operator prevents cleanup). The CR will be stuck in a Terminating state forever.

    Mitigation Strategies:

    * Robust Error Handling: Distinguish between transient errors (e.g., network timeout), which should be retried, and permanent errors (e.g., invalid credentials), which might require manual intervention. Update the CR's status conditions to reflect these errors, making it observable to humans.

    * Metrics and Alerting: Your operator must expose Prometheus metrics. Key metrics for finalizers include:

    * operator_finalizer_cleanup_duration_seconds: A histogram to track how long cleanup takes.

    * operator_finalizer_cleanup_errors_total: A counter for cleanup failures, labeled by error type.

    * operator_terminating_resources_total: A gauge to track the number of resources stuck in a terminating state. Alerts can be configured on this metric.

    * Manual Override: In a catastrophic failure, an administrator may need to manually remove the finalizer by editing the resource: kubectl patch mss my-prod-db --type=json -p='[{"op": "remove", "path": "/metadata/finalizers"}]'. This is a last resort, as it will likely orphan the external resources, but it's a necessary escape hatch.

    3. Asynchronous Cleanup for Long-Running Tasks

    If your cleanup task takes a long time (e.g., decommissioning a large database), performing it synchronously in the Reconcile loop is a bad practice. The operator's worker queue can get blocked, starving other resources of reconciliation cycles.

    Asynchronous Pattern:

  • In handleFinalizer, instead of performing the cleanup directly, update the CR's status to a Decommissioning state.
  • Create a separate Kubernetes Job to perform the actual cleanup. The Job can take minutes or hours without blocking the controller.
  • The controller's handleFinalizer logic now changes: it checks the status of the Job. It only proceeds to remove the finalizer once the Job has completed successfully.
  • If the Job fails, the controller can inspect its logs, update the CR status with the error, and potentially retry the Job.
  • This pattern decouples the long-running task from the main reconciliation loop, making the operator more scalable and resilient.

    Production-Ready Testing

    Testing finalizer logic is crucial. Using controller-runtime's envtest package, you can write integration tests that simulate the entire deletion lifecycle.

    Here is a conceptual test case:

    go
    // controllers/monitoredstatefulset_controller_test.go
    
    It("should run finalizer and clean up external resources on deletion", func() {
        ctx := context.Background()
        mss := &appv1alpha1.MonitoredStatefulSet{ /* ... definition ... */ }
    
        // 1. Create the resource
        Expect(k8sClient.Create(ctx, mss)).Should(Succeed())
    
        // 2. Verify finalizer is added and external resource is created
        // (Use a mock for the monitor client and check it was called)
        Eventually(func() bool {
            fetched := &appv1alpha1.MonitoredStatefulSet{}
            k8sClient.Get(ctx, /* namespacedName */, fetched)
            return controllerutil.ContainsString(fetched.GetFinalizers(), finalizerName) && 
                   fetched.Status.ExternalMonitorID != ""
        }, "10s", "250ms").Should(BeTrue())
    
        // 3. Delete the resource
        Expect(k8sClient.Delete(ctx, mss)).Should(Succeed())
    
        // 4. Verify the external resource cleanup was called
        // (Check your mock client's Deregister method was called with the correct ID)
    
        // 5. Verify the resource is eventually deleted from the API server
        Eventually(func() bool {
            err := k8sClient.Get(ctx, /* namespacedName */, &appv1alpha1.MonitoredStatefulSet{})
            return errors.IsNotFound(err)
        }, "10s", "250ms").Should(BeTrue())
    })

    Conclusion

    Finalizers are not an optional feature but a core component for any production-grade Kubernetes operator that manages stateful or external resources. By intercepting the deletion process, they provide the necessary hook to execute cleanup logic, prevent resource orphaning, and maintain system consistency. The implementation requires careful structuring of the reconciliation loop to handle both deletion and creation/update paths, with a strong emphasis on idempotency and robust error handling. For senior engineers, mastering the finalizer pattern is a critical step in building operators that are truly cloud-native and capable of safely automating complex application lifecycles in production environments.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles