Kubernetes Operators: Finalizers for Stateful Resource Deletion
The Orphaned Resource Problem: Why Standard Deletion Fails
As a senior engineer building on Kubernetes, you understand the power of the Operator Pattern. It allows us to extend the Kubernetes API, teaching the cluster how to manage complex, often stateful, applications. We define a desired state in a Custom Resource (CR), and the operator's controller works tirelessly to make that state a reality. This is the core of the reconciliation loop.
However, a critical gap emerges when our operator manages resources that live outside the Kubernetes cluster—a managed PostgreSQL instance in RDS, a BigQuery dataset, or a DNS record in Cloudflare. The standard Kubernetes garbage collection mechanism is designed for in-cluster objects. When you run kubectl delete my-custom-resource, Kubernetes removes the object from etcd. If that object was the owner of a Deployment and a Service, those child objects are automatically garbage collected.
But what about the RDS instance created on its behalf? Kubernetes has no knowledge of it. The reconcile request for the now-deleted CR simply stops, and the controller moves on. The result is an orphaned resource: a running, and often costly, database instance with no corresponding Kubernetes object to manage it. This is not just a resource leak; it's a critical reliability and cost-management failure in a production system.
This is the problem that finalizers solve. A finalizer is a mechanism that tells the Kubernetes API server: "Do not fully delete this object yet. There is external cleanup work that must be completed first." It allows our controller to intercept the deletion process, perform the necessary off-cluster actions, and then, and only then, give Kubernetes the green light to remove the object from etcd.
This article will walk through a production-grade implementation of an operator that manages an ExternalDatabase CRD, focusing specifically on the robust implementation of finalizers for graceful and guaranteed cleanup.
The Anatomy of Our `ExternalDatabase` Operator
To ground our discussion, let's define the components. We're building an operator to manage a fictional database-as-a-service.
1. The `ExternalDatabase` Custom Resource Definition (CRD)
Our CRD defines the schema for our custom resource. The spec declares the user's desired state, and the status is where our controller will report the observed state of the world.
api/v1/externaldatabase_types.go
package v1
import (
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
// ExternalDatabaseSpec defines the desired state of ExternalDatabase
type ExternalDatabaseSpec struct {
// Name of the database to be created
// +kubebuilder:validation:Required
// +kubebuilder:validation:MinLength=3
Name string `json:"name"`
// Engine specifies the database engine (e.g., "postgres", "mysql")
// +kubebuilder:validation:Required
// +kubebuilder:validation:Enum=postgres;mysql
Engine string `json:"engine"`
// DeletionPolicy determines what happens to the external resource when the CR is deleted.
// "Delete" will delete the external resource. "Retain" will leave it.
// +kubebuilder:validation:Enum=Delete;Retain
// +kubebuilder:default:=Delete
DeletionPolicy string `json:"deletionPolicy,omitempty"`
}
// ExternalDatabaseStatus defines the observed state of ExternalDatabase
type ExternalDatabaseStatus struct {
// DBID is the unique identifier for the database in the external system.
DBID string `json:"dbid,omitempty"`
// Conditions represent the latest available observations of the resource's state.
// +optional
// +patchMergeKey=type
// +patchStrategy=merge
Conditions []metav1.Condition `json:"conditions,omitempty" patchStrategy:"merge" patchMergeKey:"type"`
}
//+kubebuilder:object:root=true
//+kubebuilder:subresource:status
//+kubebuilder:printcolumn:name="DBID",type="string",JSONPath=".status.dbid"
//+kubebuilder:printcolumn:name="Ready",type="string",JSONPath=".status.conditions[?(@.type==\"Ready\")].status"
// ExternalDatabase is the Schema for the externaldatabases API
type ExternalDatabase struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`
Spec ExternalDatabaseSpec `json:"spec,omitempty"`
Status ExternalDatabaseStatus `json:"status,omitempty"`
}
//+kubebuilder:object:root=true
// ExternalDatabaseList contains a list of ExternalDatabase
type ExternalDatabaseList struct {
metav1.TypeMeta `json:",inline"`
metav1.ListMeta `json:"metadata,omitempty"`
Items []ExternalDatabase `json:"items"`
}
func init() {
SchemeBuilder.Register(&ExternalDatabase{}, &ExternalDatabaseList{})
}
Key elements here for senior engineers:
status.Conditions: We are using the standard metav1.Condition type. This is a best practice that makes our operator's status immediately understandable to standard Kubernetes tooling (kubectl wait, etc.) and other controllers.//+kubebuilder:subresource:status marker ensures that changes to the .status field can only be made through the /status subresource, preventing controllers from accidentally overwriting user-defined .spec changes.DeletionPolicy: This gives users control, a common pattern in production systems (e.g., PersistentVolume reclaim policies).2. The Controller and its Reconciliation Loop
Our controller's core logic lives in the Reconcile method. This method is invoked by the controller-runtime framework whenever there's a change to an ExternalDatabase resource (or a secondary resource it's watching).
Here is the skeleton of our controller. We'll flesh this out with the finalizer logic.
internal/controller/externaldatabase_controller.go
package controller
import (
"context"
"k8s.io/apimachinery/pkg/runtime"
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/client"
"sigs.k8s.io/controller-runtime/pkg/log"
databasev1 "finalizer-demo/api/v1"
)
// ExternalDatabaseReconciler reconciles a ExternalDatabase object
type ExternalDatabaseReconciler struct {
client.Client
Scheme *runtime.Scheme
// A mock client for our external DB service
// In a real implementation, this would be a proper client.
ExternalDBClient ExternalDatabaseAPI
}
//+kubebuilder:rbac:groups=database.example.com,resources=externaldatabases,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups=database.example.com,resources=externaldatabases/status,verbs=get;update;patch
//+kubebuilder:rbac:groups=database.example.com,resources=externaldatabases/finalizers,verbs=update
func (r *ExternalDatabaseReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
_ = log.FromContext(ctx)
// Business logic will go here
return ctrl.Result{}, nil
}
// SetupWithManager sets up the controller with the Manager.
func (r *ExternalDatabaseReconciler) SetupWithManager(mgr ctrl.Manager) error {
return ctrl.NewControllerManagedBy(mgr).
For(&databasev1.ExternalDatabase{}).
Complete(r)
}
The Core Pattern: A Finalizer-Aware Reconciliation Loop
The entire strategy hinges on structuring the Reconcile function to handle two distinct states: the object is being deleted, or the object is not being deleted. The presence of a deletionTimestamp on the object's metadata is the definitive signal.
Let's define our finalizer's name.
const externalDatabaseFinalizer = "database.example.com/finalizer"
Here is the high-level structure of our Reconcile function, which we will now implement piece by piece.
func (r *ExternalDatabaseReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
logger := log.FromContext(ctx)
// 1. Fetch the ExternalDatabase instance
db := &databasev1.ExternalDatabase{}
if err := r.Get(ctx, req.NamespacedName, db); err != nil {
// Handle not-found errors, which can occur after deletion
return ctrl.Result{}, client.IgnoreNotFound(err)
}
// 2. Check if the object is being deleted
if !db.ObjectMeta.DeletionTimestamp.IsZero() {
// The object is being deleted
return r.reconcileDelete(ctx, db)
}
// 3. Ensure our finalizer is present if the object is not being deleted
if !controllerutil.ContainsFinalizer(db, externalDatabaseFinalizer) {
logger.Info("Adding finalizer for ExternalDatabase")
controllerutil.AddFinalizer(db, externalDatabaseFinalizer)
if err := r.Update(ctx, db); err != nil {
return ctrl.Result{}, err
}
}
// 4. The object is not being deleted, so run the normal reconciliation logic
return r.reconcileNormal(ctx, db)
}
This structure is critical. It immediately branches the logic based on the deletion state.
Step 1: Adding the Finalizer
When a new CR is created, its deletionTimestamp is zero. Our first task is to add our finalizer to its metadata. This acts as a registration, telling Kubernetes we need to be involved in its deletion.
// Part of the main Reconcile function
// 3. Ensure our finalizer is present if the object is not being deleted
if !controllerutil.ContainsFinalizer(db, externalDatabaseFinalizer) {
logger.Info("Adding finalizer for ExternalDatabase")
controllerutil.AddFinalizer(db, externalDatabaseFinalizer)
if err := r.Update(ctx, db); err != nil {
logger.Error(err, "Failed to add finalizer")
return ctrl.Result{}, err
}
// After adding the finalizer, we return to trigger another reconcile.
// This is a good practice to ensure the state is consistent before proceeding.
return ctrl.Result{Requeue: true}, nil
}
We use the controllerutil helpers, which are part of controller-runtime, to safely add the finalizer. Notice the return ctrl.Result{Requeue: true}, nil. While not strictly necessary, it's a defensive pattern to ensure the next reconciliation cycle operates on an object that is guaranteed to have the finalizer.
Step 2: The Normal Reconciliation Logic (`reconcileNormal`)
This is the "happy path" logic that runs when the CR is being created or updated. Its job is to converge the state of the external world with the spec.
func (r *ExternalDatabaseReconciler) reconcileNormal(ctx context.Context, db *databasev1.ExternalDatabase) (ctrl.Result, error) {
logger := log.FromContext(ctx)
// If the external DB ID is not in our status, it means we need to create it.
if db.Status.DBID == "" {
logger.Info("Creating external database", "name", db.Spec.Name)
// This is our mock API call
dbID, err := r.ExternalDBClient.CreateDatabase(ctx, db.Spec.Name, db.Spec.Engine)
if err != nil {
logger.Error(err, "Failed to create external database")
// Update status to reflect the failure
meta.SetStatusCondition(&db.Status.Conditions, metav1.Condition{
Type: "Ready",
Status: metav1.ConditionFalse,
Reason: "ProvisionFailed",
Message: err.Error(),
})
if updateErr := r.Status().Update(ctx, db); updateErr != nil {
return ctrl.Result{}, updateErr
}
// Return error to trigger exponential backoff retry
return ctrl.Result{}, err
}
// Creation was successful. Update the status.
db.Status.DBID = dbID
meta.SetStatusCondition(&db.Status.Conditions, metav1.Condition{
Type: "Ready",
Status: metav1.ConditionTrue,
Reason: "Provisioned",
Message: "External database provisioned successfully",
})
logger.Info("Successfully created external database", "DBID", dbID)
if err := r.Status().Update(ctx, db); err != nil {
logger.Error(err, "Failed to update ExternalDatabase status")
return ctrl.Result{}, err
}
}
// Here you could add logic for handling updates to the spec, state drift, etc.
// For this example, we'll assume the spec is immutable.
logger.Info("Reconciliation successful")
return ctrl.Result{}, nil
}
Key Production Patterns:
if db.Status.DBID == "". If the reconciler runs again after a successful creation, it will see the DBID in the status and skip the creation step. This is essential for stability.meta.SetStatusCondition to provide rich, machine-readable status updates. This is far superior to a simple phase: "Ready" string.controller-runtime will see the non-nil error and requeue the request with exponential backoff, preventing us from hammering a failing external API.Step 3: The Deletion Logic (`reconcileDelete`)
This is the heart of the finalizer pattern. This function is only called when !db.ObjectMeta.DeletionTimestamp.IsZero() is true.
When a user runs kubectl delete, the API server does two things:
deletionTimestamp to the current time.- It triggers a reconcile event.
It does not remove the object from etcd, because our finalizer is present.
func (r *ExternalDatabaseReconciler) reconcileDelete(ctx context.Context, db *databasev1.ExternalDatabase) (ctrl.Result, error) {
logger := log.FromContext(ctx)
// Check if our finalizer is the one we should be handling
if controllerutil.ContainsFinalizer(db, externalDatabaseFinalizer) {
logger.Info("Handling deletion for ExternalDatabase")
// Respect the DeletionPolicy
if db.Spec.DeletionPolicy == "Retain" {
logger.Info("DeletionPolicy is Retain, skipping external resource deletion")
} else {
// Our core cleanup logic
if err := r.ExternalDBClient.DeleteDatabase(ctx, db.Status.DBID); err != nil {
// If the external deletion fails, we must return an error.
// This ensures the reconcile loop will be retried, and the finalizer won't be removed.
logger.Error(err, "Failed to delete external database; will retry")
// You could update status here to indicate DeletionFailed
meta.SetStatusCondition(&db.Status.Conditions, metav1.Condition{
Type: "Ready",
Status: metav1.ConditionFalse,
Reason: "DeletionFailed",
Message: err.Error(),
})
if updateErr := r.Status().Update(ctx, db); updateErr != nil {
return ctrl.Result{}, updateErr
}
return ctrl.Result{}, err
}
}
// If cleanup was successful (or skipped), we can remove the finalizer.
logger.Info("External resource cleanup successful, removing finalizer")
controllerutil.RemoveFinalizer(db, externalDatabaseFinalizer)
if err := r.Update(ctx, db); err != nil {
return ctrl.Result{}, err
}
}
// Stop reconciliation as the item is being deleted
return ctrl.Result{}, nil
}
This logic is the crux of the pattern:
DeleteDatabase should return success, not an error.DeleteDatabase returns a transient error, we return the error to the framework. The CR's deletion is now blocked. The finalizer remains, and the controller will retry the deletion after a backoff period. This guarantees cleanup.deletionTimestamp and an empty finalizer list, it completes the deletion, and the object is removed from etcd.Advanced Edge Cases and Production Considerations
Simple examples stop here. Production systems require deeper thought.
Edge Case: Controller Pod Crashes During Deletion
Imagine this sequence:
kubectl delete.reconcileDelete.r.ExternalDBClient.DeleteDatabase succeeds.controllerutil.RemoveFinalizer.Is this a problem? No. This is why the pattern is so robust. When the controller restarts, it will receive a reconcile request for the ExternalDatabase CR (which still exists because the finalizer was never removed). The reconcileDelete logic will run again. The call to DeleteDatabase must be idempotent; it should see the database is already gone and return success. The controller will then proceed to remove the finalizer, and the deletion completes. Your external API client must handle a delete call for a non-existent resource gracefully.
Edge Case: Finalizer gets "Stuck"
What if the external API is permanently unavailable, or a bug in your cleanup logic prevents it from ever succeeding? The CR will be stuck in a Terminating state forever. This is a common operational issue.
Mitigation Strategies:
kubectl edit externaldatabase my-db) to remove the finalizer. This is a last resort, as it will orphan the external resource.reconcileDelete logic. If deletion fails for over 24 hours, you could update a status condition to DeletionFailedPermanently and stop retrying, alerting an operator.Performance: `MaxConcurrentReconciles`
In your main.go, the controller is set up like this:
// main.go
err = (&controller.ExternalDatabaseReconciler{
Client: mgr.GetClient(),
Scheme: mgr.GetScheme(),
}).SetupWithManager(mgr)
This can be configured:
// main.go
err = ctrl.NewControllerManagedBy(mgr).
For(&databasev1.ExternalDatabase{}).
WithOptions(controller.Options{MaxConcurrentReconciles: 5}).
Complete(&controller.ExternalDatabaseReconciler{...})
By default, MaxConcurrentReconciles is 1. If your operator manages thousands of resources and the external API is fast, you can increase this to process reconciles in parallel. However, if your external API has strict rate limits, you might need to keep this at 1 or implement a client-side rate limiter to avoid being throttled.
Complete Implementation Example
Here is a mock external API client to make the example runnable.
internal/controller/mock_external_api.go
package controller
import (
"context"
"fmt"
"sync"
"github.com/google/uuid"
)
// A mock client that simulates an external Database-as-a-Service API
type ExternalDatabaseAPI interface {
CreateDatabase(ctx context.Context, name, engine string) (string, error)
DeleteDatabase(ctx context.Context, dbID string) error
GetDatabaseStatus(ctx context.Context, dbID string) (string, error)
}
type mockDBClient struct {
// In-memory map to simulate the external database store
mu sync.Mutex
dbs map[string]string // map[dbID]status
}
func NewMockDBClient() ExternalDatabaseAPI {
return &mockDBClient{
dbs: make(map[string]string),
}
}
func (c *mockDBClient) CreateDatabase(ctx context.Context, name, engine string) (string, error) {
c.mu.Lock()
defer c.mu.Unlock()
// Simulate potential transient errors
if name == "fail-creation" {
return "", fmt.Errorf("API error: failed to provision database cluster")
}
dbID := uuid.New().String()
c.dbs[dbID] = "available"
fmt.Printf("[Mock API] Created database %s with ID %s\n", name, dbID)
return dbID, nil
}
func (c *mockDBClient) DeleteDatabase(ctx context.Context, dbID string) error {
c.mu.Lock()
defer c.mu.Unlock()
// Idempotency: If the DB doesn't exist, it's a success from a cleanup perspective.
if _, ok := c.dbs[dbID]; !ok {
fmt.Printf("[Mock API] Delete called for non-existent DB ID %s. Treating as success.\n", dbID)
return nil
}
delete(c.dbs, dbID)
fmt.Printf("[Mock API] Deleted database with ID %s\n", dbID)
return nil
}
func (c *mockDBClient) GetDatabaseStatus(ctx context.Context, dbID string) (string, error) {
c.mu.Lock()
defer c.mu.Unlock()
if status, ok := c.dbs[dbID]; ok {
return status, nil
}
return "", fmt.Errorf("database with ID %s not found", dbID)
}
This simple mock demonstrates the critical idempotent nature of the DeleteDatabase call.
Conclusion
The finalizer pattern is not merely a feature of Kubernetes; it is the cornerstone of writing a reliable operator that manages external, stateful resources. By intercepting the deletion process, your controller gains the ability to perform crucial cleanup tasks, preventing orphaned resources and ensuring the integrity of your system.
A production-ready finalizer implementation can be summarized by these principles:
Reconcile loop to immediately check for the deletionTimestamp.Mastering this pattern moves you from writing basic controllers that simply create resources to building robust, self-healing, and production-grade operators that can be trusted to manage the complete lifecycle of critical infrastructure.