Kubernetes Operators: Finalizers for Stateful Resource Deletion
The Inadequacy of Standard Deletion for Stateful Workloads
In the Kubernetes ecosystem, the declarative API is king. We define our desired state in a Custom Resource (CR), and the operator's controller works tirelessly to make reality match that state. This works beautifully for creation and updates. However, the deletion lifecycle presents a significant challenge for any operator managing resources with stateful dependencies outside the Kubernetes cluster.
Consider an operator that manages a DatabaseCluster CR. When a developer creates a DatabaseCluster object, the operator might provision a StatefulSet for the database pods, a Service for networking, and critically, it might also call an external cloud provider's API to provision a persistent block storage volume and register the new database in a separate monitoring service.
What happens when a developer runs kubectl delete databasecluster my-prod-db? By default, the Kubernetes API server initiates a cascading deletion. The DatabaseCluster object is marked for deletion, and its owned resources within Kubernetes (like the StatefulSet and Service) are garbage collected. The problem is that Kubernetes has no knowledge of the external block storage volume or the monitoring service entry. The operator's controller, which holds the logic for managing these external resources, loses its trigger—the DatabaseCluster object—the moment it's deleted from etcd. The result is an orphaned, and potentially costly, cloud resource and stale data in your monitoring system.
This is where finalizers become an indispensable tool in the operator developer's arsenal. A finalizer is a namespaced key in an object's metadata that tells the Kubernetes API server to block the physical deletion of a resource until that specific key is removed. It's a pre-deletion hook that allows our controller to execute complex, stateful cleanup logic before allowing the resource to be removed from the API.
This article will walk through the production-grade implementation of a finalizer within a custom Go operator built with controller-runtime. We will build an operator for a MonitoredStatefulSet that not only manages a StatefulSet but also an external, simulated monitoring service entry, ensuring it's gracefully deregistered upon deletion.
Architecting the `MonitoredStatefulSet` Operator
Our goal is to create a controller that ensures an external resource is always in sync with its corresponding Kubernetes CR, especially during deletion. Let's start by defining our MonitoredStatefulSet CRD.
1. The Custom Resource Definition (CRD)
The CRD defines the API for our custom resource. The spec describes the desired state, and the status reflects the observed state.
// api/v1alpha1/monitoredstatefulset_types.go
package v1alpha1
import (
appsv1 "k8s.io/api/apps/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
// MonitoredStatefulSetSpec defines the desired state of MonitoredStatefulSet
type MonitoredStatefulSetSpec struct {
// StatefulSetSpec is the spec for the StatefulSet that this resource will manage.
// +kubebuilder:validation:Required
StatefulSetSpec appsv1.StatefulSetSpec `json:"statefulSetSpec"`
// MonitorEndpoint is the URL of the external monitoring service.
// +kubebuilder:validation:Required
MonitorEndpoint string `json:"monitorEndpoint"`
}
// MonitoredStatefulSetStatus defines the observed state of MonitoredStatefulSet
type MonitoredStatefulSetStatus struct {
// Conditions represent the latest available observations of an object's state.
Conditions []metav1.Condition `json:"conditions,omitempty"`
// ExternalMonitorID is the ID assigned by the external monitoring service.
ExternalMonitorID string `json:"externalMonitorID,omitempty"`
}
//+kubebuilder:object:root=true
//+kubebuilder:subresource:status
// MonitoredStatefulSet is the Schema for the monitoredstatefulsets API
type MonitoredStatefulSet struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`
Spec MonitoredStatefulSetSpec `json:"spec,omitempty"`
Status MonitoredStatefulSetStatus `json:"status,omitempty"`
}
//+kubebuilder:object:root=true
// MonitoredStatefulSetList contains a list of MonitoredStatefulSet
type MonitoredStatefulSetList struct {
metav1.TypeMeta `json:",inline"`
metav1.ListMeta `json:"metadata,omitempty"`
Items []MonitoredStatefulSet `json:"items"`
}
func init() {
SchemeBuilder.Register(&MonitoredStatefulSet{}, &MonitoredStatefulSetList{})
}
Our spec embeds a standard StatefulSetSpec and adds a MonitorEndpoint. The status will hold conditions and the ExternalMonitorID we get back from the external service.
2. The Reconciliation Loop Structure
The core of the operator is the Reconcile method. Its fundamental structure must now account for two distinct paths: the normal reconciliation path (creation/updates) and the deletion path (when a finalizer is present and deletion is requested).
// controllers/monitoredstatefulset_controller.go
const finalizerName = "statefulsets.example.com/finalizer"
func (r *MonitoredStatefulSetReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
log := log.FromContext(ctx)
// 1. Fetch the MonitoredStatefulSet instance
mss := &appv1alpha1.MonitoredStatefulSet{}
if err := r.Get(ctx, req.NamespacedName, mss); err != nil {
if errors.IsNotFound(err) {
log.Info("MonitoredStatefulSet resource not found. Ignoring since object must be deleted")
return ctrl.Result{}, nil
}
log.Error(err, "Failed to get MonitoredStatefulSet")
return ctrl.Result{}, err
}
// 2. Check if the object is being deleted
isMarkedForDeletion := mss.GetDeletionTimestamp() != nil
if isMarkedForDeletion {
if controllerutil.ContainsString(mss.GetFinalizers(), finalizerName) {
// Run our finalizer logic
if err := r.handleFinalizer(ctx, mss); err != nil {
// Don't remove the finalizer if cleanup fails, so we can retry.
return ctrl.Result{}, err
}
// Cleanup was successful, remove the finalizer
controllerutil.RemoveString(mss.GetFinalizers(), finalizerName)
if err := r.Update(ctx, mss); err != nil {
return ctrl.Result{}, err
}
}
// Stop reconciliation as the item is being deleted
return ctrl.Result{}, nil
}
// 3. Add the finalizer for this CR if it doesn't have one
if !controllerutil.ContainsString(mss.GetFinalizers(), finalizerName) {
log.Info("Adding finalizer for MonitoredStatefulSet")
controllerutil.AddFinalizer(mss, finalizerName)
if err := r.Update(ctx, mss); err != nil {
return ctrl.Result{}, err
}
}
// 4. Run the main reconciliation logic for create/update
return r.handleReconciliation(ctx, mss)
}
This structure is critical:
- We fetch the instance.
GetDeletionTimestamp(). If it's non-nil, the user has requested deletion. We then check if our finalizer is still present.handleFinalizer). If cleanup succeeds, we remove the finalizer and update the object. This signals to Kubernetes that it can now proceed with the deletion.handleReconciliation).Deep Dive: Implementing the Finalizer Logic
Let's implement the handleFinalizer and handleReconciliation methods. We'll need a mock client for our external monitoring service.
1. A Mock External Service Client
For a realistic example, let's define a simple client that simulates talking to an external monitoring API.
// internal/monitor/client.go
package monitor
import (
"bytes"
"context"
"encoding/json"
"fmt"
"net/http"
"time"
)
// Mock client for a monitoring service
type Client struct {
Endpoint string
HTTPClient *http.Client
}
func NewClient(endpoint string) *Client {
return &Client{
Endpoint: endpoint,
HTTPClient: &http.Client{Timeout: 5 * time.Second},
}
}
type RegisterPayload struct {
ServiceName string `json:"serviceName"`
Namespace string `json:"namespace"`
}
type RegisterResponse struct {
MonitorID string `json:"monitorID"`
}
func (c *Client) Register(ctx context.Context, name, namespace string) (string, error) {
// In a real implementation, this would make an HTTP POST request
// For this example, we simulate success and return a generated ID.
fmt.Printf("SIMULATING: Registering %s/%s with monitoring service at %s\n", namespace, name, c.Endpoint)
// Simulate network latency
time.Sleep(100 * time.Millisecond)
return fmt.Sprintf("mon-%s-%s", namespace, name), nil
}
func (c *Client) Deregister(ctx context.Context, monitorID string) error {
// In a real implementation, this would make an HTTP DELETE request
// For this example, we simulate success.
fmt.Printf("SIMULATING: Deregistering monitor ID %s from monitoring service\n", monitorID)
// Simulate network latency
time.Sleep(150 * time.Millisecond)
// To test idempotency, we could simulate a 'not found' error if called twice.
// if monitorID is already deleted { return nil }
return nil
}
2. The Main Reconciliation Logic (`handleReconciliation`)
This function handles the creation and updates of both the Kubernetes StatefulSet and the external monitor.
// controllers/monitoredstatefulset_controller.go
func (r *MonitoredStatefulSetReconciler) handleReconciliation(ctx context.Context, mss *appv1alpha1.MonitoredStatefulSet) (ctrl.Result, error) {
log := log.FromContext(ctx)
// === 1. Reconcile the StatefulSet ===
sts := &appsv1.StatefulSet{}
err := r.Get(ctx, types.NamespacedName{Name: mss.Name, Namespace: mss.Namespace}, sts)
if err != nil && errors.IsNotFound(err) {
log.Info("Creating a new StatefulSet")
sts = r.statefulSetForMSS(mss)
if err := ctrl.SetControllerReference(mss, sts, r.Scheme); err != nil {
return ctrl.Result{}, err
}
if err := r.Create(ctx, sts); err != nil {
log.Error(err, "Failed to create new StatefulSet")
return ctrl.Result{}, err
}
// StatefulSet created successfully, requeue to check status
return ctrl.Result{Requeue: true}, nil
} else if err != nil {
log.Error(err, "Failed to get StatefulSet")
return ctrl.Result{}, err
}
// === 2. Reconcile the External Monitor ===
// If ExternalMonitorID is not set in status, we need to register it.
if mss.Status.ExternalMonitorID == "" {
log.Info("Registering with external monitoring service")
monitorClient := monitor.NewClient(mss.Spec.MonitorEndpoint)
monitorID, err := monitorClient.Register(ctx, mss.Name, mss.Namespace)
if err != nil {
log.Error(err, "Failed to register with monitoring service")
// Update status with a condition
meta.SetStatusCondition(&mss.Status.Conditions, metav1.Condition{
Type: "MonitorRegistered",
Status: metav1.ConditionFalse,
Reason: "RegistrationFailed",
Message: err.Error(),
})
if updateErr := r.Status().Update(ctx, mss); updateErr != nil {
return ctrl.Result{}, updateErr
}
return ctrl.Result{}, err
}
// Registration successful, update status
mss.Status.ExternalMonitorID = monitorID
meta.SetStatusCondition(&mss.Status.Conditions, metav1.Condition{
Type: "MonitorRegistered",
Status: metav1.ConditionTrue,
Reason: "RegistrationSuccessful",
})
if err := r.Status().Update(ctx, mss); err != nil {
return ctrl.Result{}, err
}
log.Info("Successfully registered with monitoring service", "MonitorID", monitorID)
}
return ctrl.Result{}, nil
}
// Helper function to construct the StatefulSet
func (r *MonitoredStatefulSetReconciler) statefulSetForMSS(mss *appv1alpha1.MonitoredStatefulSet) *appsv1.StatefulSet {
// Logic to create the StatefulSet object from the spec
sts := &appsv1.StatefulSet{
ObjectMeta: metav1.ObjectMeta{
Name: mss.Name,
Namespace: mss.Namespace,
},
Spec: mss.Spec.StatefulSetSpec,
}
return sts
}
This logic is idempotent. If the StatefulSet exists, it does nothing. If the ExternalMonitorID is already in the status, it skips the registration. This prevents creating duplicate resources if the reconciliation loop runs multiple times.
3. The Finalizer Cleanup Logic (`handleFinalizer`)
This is the most critical piece. This code only runs when the object is marked for deletion.
// controllers/monitoredstatefulset_controller.go
func (r *MonitoredStatefulSetReconciler) handleFinalizer(ctx context.Context, mss *appv1alpha1.MonitoredStatefulSet) error {
log := log.FromContext(ctx)
if mss.Status.ExternalMonitorID != "" {
log.Info("Performing finalizer cleanup: deregistering from monitoring service", "MonitorID", mss.Status.ExternalMonitorID)
monitorClient := monitor.NewClient(mss.Spec.MonitorEndpoint)
if err := monitorClient.Deregister(ctx, mss.Status.ExternalMonitorID); err != nil {
// Here, you might want to check for specific errors.
// If the error is a 'NotFound' error, it means the resource is already gone,
// so we can consider the cleanup successful.
log.Error(err, "Failed to deregister from monitoring service during finalization")
return err
}
}
log.Info("External resources cleaned up successfully. Finalizer can be removed.")
return nil
}
This function reads the ExternalMonitorID from the object's status and calls the Deregister method on our client. If this call fails, the function returns an error. As seen in our main Reconcile loop, returning an error here prevents the finalizer from being removed, and the reconciliation will be retried.
Advanced Patterns and Edge Case Handling
Writing robust operators requires thinking beyond the happy path. Finalizers introduce their own set of edge cases.
1. Idempotency in Deletion
What happens if the operator pod crashes right after Deregister succeeds but before the r.Update(ctx, mss) call removes the finalizer? When the operator restarts, it will receive the event for the deleting object again and re-run handleFinalizer.
Our external service client must be idempotent. A DELETE request to an already-deleted resource should not return an error. It should return a success (e.g., 204 No Content) or a 'not found' (e.g., 404 Not Found). Our Deregister function should handle the 404 case as a success, ensuring that a retry doesn't fail and block the deletion indefinitely.
2. Handling a Stuck Finalizer
A common production issue is a finalizer that can't be removed because the cleanup logic is perpetually failing (e.g., the external API is down, or a bug in the operator prevents cleanup). The CR will be stuck in a Terminating state forever.
Mitigation Strategies:
* Robust Error Handling: Distinguish between transient errors (e.g., network timeout), which should be retried, and permanent errors (e.g., invalid credentials), which might require manual intervention. Update the CR's status conditions to reflect these errors, making it observable to humans.
* Metrics and Alerting: Your operator must expose Prometheus metrics. Key metrics for finalizers include:
* operator_finalizer_cleanup_duration_seconds: A histogram to track how long cleanup takes.
* operator_finalizer_cleanup_errors_total: A counter for cleanup failures, labeled by error type.
* operator_terminating_resources_total: A gauge to track the number of resources stuck in a terminating state. Alerts can be configured on this metric.
* Manual Override: In a catastrophic failure, an administrator may need to manually remove the finalizer by editing the resource: kubectl patch mss my-prod-db --type=json -p='[{"op": "remove", "path": "/metadata/finalizers"}]'. This is a last resort, as it will likely orphan the external resources, but it's a necessary escape hatch.
3. Asynchronous Cleanup for Long-Running Tasks
If your cleanup task takes a long time (e.g., decommissioning a large database), performing it synchronously in the Reconcile loop is a bad practice. The operator's worker queue can get blocked, starving other resources of reconciliation cycles.
Asynchronous Pattern:
handleFinalizer, instead of performing the cleanup directly, update the CR's status to a Decommissioning state.Job to perform the actual cleanup. The Job can take minutes or hours without blocking the controller.handleFinalizer logic now changes: it checks the status of the Job. It only proceeds to remove the finalizer once the Job has completed successfully.Job fails, the controller can inspect its logs, update the CR status with the error, and potentially retry the Job.This pattern decouples the long-running task from the main reconciliation loop, making the operator more scalable and resilient.
Production-Ready Testing
Testing finalizer logic is crucial. Using controller-runtime's envtest package, you can write integration tests that simulate the entire deletion lifecycle.
Here is a conceptual test case:
// controllers/monitoredstatefulset_controller_test.go
It("should run finalizer and clean up external resources on deletion", func() {
ctx := context.Background()
mss := &appv1alpha1.MonitoredStatefulSet{ /* ... definition ... */ }
// 1. Create the resource
Expect(k8sClient.Create(ctx, mss)).Should(Succeed())
// 2. Verify finalizer is added and external resource is created
// (Use a mock for the monitor client and check it was called)
Eventually(func() bool {
fetched := &appv1alpha1.MonitoredStatefulSet{}
k8sClient.Get(ctx, /* namespacedName */, fetched)
return controllerutil.ContainsString(fetched.GetFinalizers(), finalizerName) &&
fetched.Status.ExternalMonitorID != ""
}, "10s", "250ms").Should(BeTrue())
// 3. Delete the resource
Expect(k8sClient.Delete(ctx, mss)).Should(Succeed())
// 4. Verify the external resource cleanup was called
// (Check your mock client's Deregister method was called with the correct ID)
// 5. Verify the resource is eventually deleted from the API server
Eventually(func() bool {
err := k8sClient.Get(ctx, /* namespacedName */, &appv1alpha1.MonitoredStatefulSet{})
return errors.IsNotFound(err)
}, "10s", "250ms").Should(BeTrue())
})
Conclusion
Finalizers are not an optional feature but a core component for any production-grade Kubernetes operator that manages stateful or external resources. By intercepting the deletion process, they provide the necessary hook to execute cleanup logic, prevent resource orphaning, and maintain system consistency. The implementation requires careful structuring of the reconciliation loop to handle both deletion and creation/update paths, with a strong emphasis on idempotency and robust error handling. For senior engineers, mastering the finalizer pattern is a critical step in building operators that are truly cloud-native and capable of safely automating complex application lifecycles in production environments.