Crafting K8s Operators with Finalizers for Stateful App Management
The Deletion Paradox: Why Stateful Operators Fail Without Finalizers
In the lifecycle of a Kubernetes Operator, the Reconcile loop is often seen as the heart of its logic. It ensures the state of the world matches the desired state defined in a Custom Resource (CR). We meticulously code the creation and update paths, handling intricate logic to provision and configure external resources—a cloud database, a message queue, a DNS record. But what happens when a user types kubectl delete my-cr my-instance?
By default, the Kubernetes API server marks the object for deletion and the garbage collector swiftly removes it from etcd. Your operator's Reconcile function receives a NotFound error on the next loop, and as far as it's concerned, the object is gone. The problem? The external, stateful resource it so carefully provisioned is now an orphan—a ghost in the cloud, incurring costs and posing a potential security risk.
This is the deletion paradox. The very trigger for cleanup (the CR object) is destroyed before the cleanup can be reliably executed. This is precisely the problem that Finalizers are designed to solve. A Finalizer is a list of strings in an object's metadata that tells the Kubernetes API server, "Do not fully delete this object until these identifiers are removed."
When an object with a finalizer is deleted, the API server doesn't delete it immediately. Instead, it sets the metadata.deletionTimestamp field to the current time and leaves the object in a Terminating state. The object remains visible to your operator's Reconcile loop. This is your cue—your pre-delete hook—to perform all necessary external cleanup. Once your operator confirms the external resource is gone, it removes its finalizer from the object's metadata. If your finalizer was the last one, the Kubernetes garbage collector proceeds with the final deletion.
This article is a deep dive into the practical, production-level implementation of this pattern. We will move beyond the theory and build a robust ManagedDatabase operator in Go that uses finalizers to manage the lifecycle of a (mocked) external database instance.
A Production Scenario: The `ManagedDatabase` Operator
Let's define our use case. We'll build an operator that manages a ManagedDatabase CR.
CRD Specification (api/v1/manageddatabase_types.go):
Our CR will have a spec defining the desired database and a status to reflect its real-world state.
// ManagedDatabaseSpec defines the desired state of ManagedDatabase
type ManagedDatabaseSpec struct {
// Engine specifies the database engine (e.g., "postgres", "mysql")
// +kubebuilder:validation:Enum=postgres;mysql
// +kubebuilder:validation:Required
Engine string `json:"engine"`
// Version specifies the engine version.
// +kubebuilder:validation:Required
Version string `json:"version"`
// Username for the database admin user.
// +kubebuilder:validation:Required
Username string `json:"username"`
}
// ManagedDatabaseStatus defines the observed state of ManagedDatabase
type ManagedDatabaseStatus struct {
// Conditions represent the latest available observations of an object's state
// +optional
Conditions []metav1.Condition `json:"conditions,omitempty"`
// InstanceID is the unique identifier for the external database instance.
// +optional
InstanceID string `json:"instanceID,omitempty"`
// Endpoint is the connection address for the database.
// +optional
Endpoint string `json:"endpoint,omitempty"`
}
The Operator's Core Responsibilities:
ManagedDatabase CR is created, the operator calls an external DatabaseProvider API to provision a new database instance.status with the InstanceID and Endpoint from the provider.ManagedDatabase CR is deleted, the operator calls the DatabaseProvider API to deprovision the database instance before allowing the CR to be removed from Kubernetes.For this example, we'll mock the external provider to focus purely on the operator's logic.
// internal/dbprovider/mock_provider.go
package dbprovider
import (
"fmt"
"math/rand"
"sync"
"time"
)
// A mock database instance
type DBInstance struct {
ID string
Engine string
Version string
Endpoint string
Status string // PROVISIONING, AVAILABLE, DELETING, DELETED
}
// Mock provider that simulates a cloud database service
type MockProvider struct {
mu sync.Mutex
Instances map[string]*DBInstance
}
func NewMockProvider() *MockProvider {
return &MockProvider{
Instances: make(map[string]*DBInstance),
}
}
// Simulates creating a database
func (p *MockProvider) CreateDB(engine, version string) (*DBInstance, error) {
p.mu.Lock()
defer p.mu.Unlock()
id := fmt.Sprintf("db-%d", rand.Intn(10000))
instance := &DBInstance{
ID: id,
Engine: engine,
Version: version,
Endpoint: fmt.Sprintf("%s.db.example.com", id),
Status: "PROVISIONING",
}
p.Instances[id] = instance
// Simulate provisioning time
go func() {
time.Sleep(5 * time.Second)
p.mu.Lock()
defer p.mu.Unlock()
if p.Instances[id] != nil {
p.Instances[id].Status = "AVAILABLE"
}
}()
return instance, nil
}
// Simulates fetching database status
func (p *MockProvider) GetDB(instanceID string) (*DBInstance, error) {
p.mu.Lock()
defer p.mu.Unlock()
instance, ok := p.Instances[instanceID]
if !ok {
return nil, fmt.Errorf("instance %s not found", instanceID)
}
return instance, nil
}
// Simulates deleting a database
func (p *MockProvider) DeleteDB(instanceID string) error {
p.mu.Lock()
defer p.mu.Unlock()
instance, ok := p.Instances[instanceID]
if !ok {
// Idempotency: if it's already gone, that's a success for deletion.
return nil
}
instance.Status = "DELETING"
// Simulate deletion time
go func() {
time.Sleep(5 * time.Second)
p.mu.Lock()
defer p.mu.Unlock()
delete(p.Instances, instanceID)
}()
return nil
}
Deep Dive: Implementing the Reconciler with Finalizer Logic
Now, let's construct the Reconcile method. This is where the core logic resides. We'll use controller-runtime, the standard library for building operators.
// internal/controller/manageddatabase_controller.go
package controller
import (
// ... imports
"context"
"time"
apierrors "k8s.io/apimachinery/pkg/api/errors"
"k8s.io/apimachinery/pkg/runtime"
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/client"
"sigs.k8s.io/controller-runtime/pkg/controller/controllerutil"
"sigs.k8s.io/controller-runtime/pkg/log"
dbV1alpha1 "my-operator/api/v1alpha1"
"my-operator/internal/dbprovider"
)
const managedDatabaseFinalizer = "db.example.com/finalizer"
type ManagedDatabaseReconciler struct {
client.Client
Scheme *runtime.Scheme
Provider *dbprovider.MockProvider // Our mock DB provider
}
func (r *ManagedDatabaseReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
logger := log.FromContext(ctx)
// 1. Fetch the ManagedDatabase instance
dbInstance := &dbV1alpha1.ManagedDatabase{}
if err := r.Get(ctx, req.NamespacedName, dbInstance); err != nil {
if apierrors.IsNotFound(err) {
// Object was deleted, nothing to do. This happens after our finalizer is removed.
logger.Info("ManagedDatabase resource not found. Ignoring since object must be deleted")
return ctrl.Result{}, nil
}
// Error reading the object - requeue the request.
logger.Error(err, "Failed to get ManagedDatabase")
return ctrl.Result{}, err
}
// 2. The core finalizer logic
if dbInstance.ObjectMeta.DeletionTimestamp.IsZero() {
// The object is NOT being deleted. Let's add our finalizer if it doesn't exist.
if !controllerutil.ContainsFinalizer(dbInstance, managedDatabaseFinalizer) {
logger.Info("Adding finalizer for ManagedDatabase")
controllerutil.AddFinalizer(dbInstance, managedDatabaseFinalizer)
if err := r.Update(ctx, dbInstance); err != nil {
return ctrl.Result{}, err
}
}
} else {
// The object IS being deleted
if controllerutil.ContainsFinalizer(dbInstance, managedDatabaseFinalizer) {
logger.Info("Performing finalizer cleanup for ManagedDatabase")
// Our finalizer is present, so let's handle external dependency cleanup.
if err := r.cleanupExternalResources(ctx, dbInstance); err != nil {
// If cleanup fails, we don't remove the finalizer.
// The reconciliation will be retried.
logger.Error(err, "Failed to cleanup external resources")
return ctrl.Result{}, err
}
// Cleanup was successful. Remove our finalizer.
logger.Info("External resources cleaned up. Removing finalizer")
controllerutil.RemoveFinalizer(dbInstance, managedDatabaseFinalizer)
if err := r.Update(ctx, dbInstance); err != nil {
return ctrl.Result{}, err
}
}
// Stop reconciliation as the item is being deleted
return ctrl.Result{}, nil
}
// 3. The main reconciliation logic (Create/Update)
// This part runs only if the object is not being deleted.
return r.reconcileExternalResources(ctx, dbInstance)
}
The Reconcile function is now a dispatcher. It first checks the DeletionTimestamp. If it's non-zero, it routes to the cleanup logic. If it's zero, it ensures the finalizer is present and then routes to the main creation/update logic.
The Creation/Update Path (`reconcileExternalResources`)
This function handles the provisioning and status updates. The key is its idempotency.
// internal/controller/manageddatabase_controller.go
func (r *ManagedDatabaseReconciler) reconcileExternalResources(ctx context.Context, dbInstance *dbV1alpha1.ManagedDatabase) (ctrl.Result, error) {
logger := log.FromContext(ctx)
// Check if the external DB instance is already created
if dbInstance.Status.InstanceID == "" {
// Not created yet, so let's provision it.
logger.Info("Provisioning new database instance")
externalDB, err := r.Provider.CreateDB(dbInstance.Spec.Engine, dbInstance.Spec.Version)
if err != nil {
logger.Error(err, "Failed to provision external database")
// TODO: Update status with a failure condition
return ctrl.Result{}, err
}
// Immediately update the status with the new InstanceID.
// This is critical. If the operator restarts now, it will know the DB exists.
dbInstance.Status.InstanceID = externalDB.ID
dbInstance.Status.Endpoint = ""
// TODO: Set a "Provisioning" condition
if err := r.Status().Update(ctx, dbInstance); err != nil {
logger.Error(err, "Failed to update ManagedDatabase status after creation")
return ctrl.Result{}, err
}
// Requeue to check status later, as provisioning is async
logger.Info("Database is provisioning. Requeueing for status check.")
return ctrl.Result{RequeueAfter: 15 * time.Second}, nil
}
// Instance exists, let's check its status.
externalDB, err := r.Provider.GetDB(dbInstance.Status.InstanceID)
if err != nil {
logger.Error(err, "Failed to get external database status")
return ctrl.Result{}, err
}
switch externalDB.Status {
case "PROVISIONING":
logger.Info("Database is still provisioning", "InstanceID", externalDB.ID)
// TODO: Update status condition to reflect provisioning state
return ctrl.Result{RequeueAfter: 15 * time.Second}, nil
case "AVAILABLE":
if dbInstance.Status.Endpoint != externalDB.Endpoint {
logger.Info("Database is available. Updating endpoint.", "InstanceID", externalDB.ID)
dbInstance.Status.Endpoint = externalDB.Endpoint
// TODO: Set a "Ready" condition
if err := r.Status().Update(ctx, dbInstance); err != nil {
logger.Error(err, "Failed to update ManagedDatabase status with endpoint")
return ctrl.Result{}, err
}
}
logger.Info("Database is in desired state.", "InstanceID", externalDB.ID)
return ctrl.Result{}, nil // All done!
default:
logger.Info("Database in an unknown state", "State", externalDB.Status)
return ctrl.Result{RequeueAfter: 30 * time.Second}, nil
}
}
The Deletion Path (`cleanupExternalResources`)
This is the crux of the finalizer pattern. This function is only called when DeletionTimestamp is set.
// internal/controller/manageddatabase_controller.go
func (r *ManagedDatabaseReconciler) cleanupExternalResources(ctx context.Context, dbInstance *dbV1alpha1.ManagedDatabase) error {
logger := log.FromContext(ctx)
if dbInstance.Status.InstanceID == "" {
// If there's no instance ID, the external resource was likely never created.
// We can safely remove the finalizer.
logger.Info("No InstanceID in status. Assuming external resource does not exist.")
return nil
}
// Step 1: Trigger deletion of the external resource.
logger.Info("Deleting external database instance", "InstanceID", dbInstance.Status.InstanceID)
if err := r.Provider.DeleteDB(dbInstance.Status.InstanceID); err != nil {
// This is a critical failure. We must return an error to retry.
// The finalizer will NOT be removed.
return fmt.Errorf("failed to trigger deletion of external database %s: %w", dbInstance.Status.InstanceID, err)
}
// Step 2: Poll for confirmation of deletion.
// In a real-world scenario, this is crucial. Deletion is asynchronous.
externalDB, err := r.Provider.GetDB(dbInstance.Status.InstanceID)
if err != nil {
// If we get a "not found" error, it means deletion is complete.
// Our mock provider's DeleteDB is idempotent, so this works.
logger.Info("External database successfully deleted (confirmed by GetDB error)")
return nil
}
if externalDB.Status == "DELETING" {
// The resource is still being deleted. We must requeue and check again later.
// We return an error to trigger a requeue. A custom error type or specific message
// could allow for more intelligent backoff in the main loop.
logger.Info("External database is still deleting. Requeuing.")
return fmt.Errorf("deletion pending for instance %s", dbInstance.Status.InstanceID)
}
logger.Info("External database cleanup complete.")
return nil
}
A Note on Requeueing During Cleanup: In cleanupExternalResources, we return an error (deletion pending...) to force a requeue. This leverages the controller-runtime's default exponential backoff, which is appropriate for polling. A more advanced implementation might use ctrl.Result{RequeueAfter: ...} if the external API provides an estimated time for deletion.
Advanced Patterns and Edge Case Handling
Building a simple finalizer loop is one thing; making it production-ready requires anticipating failures.
1. Idempotency in Cleanup
Problem: The operator might crash and restart after triggering the DeleteDB call but before confirming deletion. On the next reconcile, cleanupExternalResources runs again.
Solution: The cleanup logic must be idempotent. Our MockProvider.DeleteDB function demonstrates this: if !ok { return nil }. If the instance is already gone, it returns a success. Likewise, cleanupExternalResources treats a NotFound error from GetDB as a successful deletion. Your cloud provider's SDK will typically behave this way (e.g., deleting a non-existent S3 bucket is not an error). You must code your operator to interpret this correctly.
2. Handling Stuck Finalizers
Problem: A bug in your cleanup logic or a permanent issue with the external API could prevent the finalizer from ever being removed. The object will be stuck in the Terminating state forever, and kubectl delete ... --force won't work.
Solution:
* Monitoring and Alerting: Your operator should have metrics tracking the number of objects in a Terminating state for an extended period. An alert on this metric is a signal for manual intervention.
* Manual Intervention: An administrator with cluster-admin privileges can manually remove the finalizer:
# Get the object in its raw JSON form
kubectl get manageddatabase my-db-instance -n my-ns -o json > my-db.json
# Edit my-db.json and remove the "db.example.com/finalizer" entry from the metadata.finalizers array.
# Then, update the object's finalizers subresource:
kubectl replace --raw "/api/v1/namespaces/my-ns/manageddatabases/my-db-instance/finalize" -f ./my-db.json
# A more direct (but potentially racy) approach is patching:
kubectl patch manageddatabase my-db-instance -n my-ns --type json -p='[{"op": "remove", "path": "/metadata/finalizers"}]'
CRITICAL WARNING: Manually removing a finalizer is a break-glass procedure. It should only be done after confirming the external resource has been manually cleaned up. Otherwise, you are re-introducing the orphaned resource problem the finalizer was meant to solve.
3. Asynchronous Operations and Requeue Strategy
Problem: Operations like database provisioning or deletion can take minutes. A simple return ctrl.Result{}, err will cause a rapid, aggressive requeue cycle, potentially hammering the external API and spamming logs.
Solution: Differentiate between transient failures and long-running operations.
* For transient API errors (e.g., 503 Service Unavailable, network timeout): Return a generic error (return ctrl.Result{}, err). This lets controller-runtime's exponential backoff handle retries gracefully.
* For polling an in-progress operation (e.g., status is PROVISIONING or DELETING): Return a ctrl.Result{RequeueAfter: duration}. This signals a successful reconciliation for now, with a scheduled check-in later. The duration should be reasonable (e.g., 15-30 seconds) to balance responsiveness with API call volume.
Here's a refined error handling snippet for the cleanup function:
// ... inside cleanupExternalResources
externalDB, err := r.Provider.GetDB(dbInstance.Status.InstanceID)
if err != nil {
// Let's assume our provider returns a specific error type for NotFound
if dbprovider.IsNotFound(err) {
logger.Info("External resource confirmed deleted.")
return nil // Success
}
// Any other error is a transient failure, retry with backoff
return err
}
if externalDB.Status == "DELETING" {
// This is not an error, but a state requiring a future check.
// We'll return a specific error that our main Reconcile loop can interpret.
return &RequeueError{Message: "Deletion pending", RequeueAfter: 30 * time.Second}
}
// A custom error type to control requeue behavior
type RequeueError struct {
Message string
RequeueAfter time.Duration
}
func (e *RequeueError) Error() string {
return e.Message
}
// In the main Reconcile function:
// ...
if err := r.cleanupExternalResources(ctx, dbInstance); err != nil {
if requeueErr, ok := err.(*RequeueError); ok {
logger.Info(requeueErr.Message)
return ctrl.Result{RequeueAfter: requeueErr.RequeueAfter}, nil
}
// It's a real error, use exponential backoff
logger.Error(err, "Cleanup failed with a transient error")
return ctrl.Result{}, err
}
// ...
Performance and Scalability Considerations
Controller Concurrency
By default, a controller reconciles one resource at a time. If you have thousands of ManagedDatabase CRs, and a few are stuck in long provisioning cycles, they can block reconciliations for all other resources.
You can increase parallelism by setting MaxConcurrentReconciles when setting up your manager in main.go.
// cmd/main.go
// ...
err = ctrl.NewControllerManagedBy(mgr).
For(&dbv1alpha1.ManagedDatabase{}).
WithOptions(controller.Options{MaxConcurrentReconciles: 10}).
Complete(&dbcontroller.ManagedDatabaseReconciler{
Client: mgr.GetClient(),
Scheme: mgr.GetScheme(),
Provider: dbprovider.NewMockProvider(),
})
// ...
Caveat: Increasing concurrency means your operator will make more parallel calls to the external API. Ensure your API client has appropriate rate limiting or that your provider's API quotas can handle the load. A MaxConcurrentReconciles of 5-10 is a common starting point.
API Throttling
If your operator manages a large number of resources, the combined polling from all Reconcile loops can exceed the rate limits of your external provider's API. This is especially true with high concurrency.
Solution: Use a rate-limited HTTP client for your external API calls. Libraries like golang.org/x/time/rate provide token bucket implementations that are easy to integrate into an http.Client transport.
import "golang.org/x/time/rate"
// In your provider client setup
limiter := rate.NewLimiter(rate.Limit(10), 1) // 10 requests per second, burst of 1
// In your API call function
if err := limiter.Wait(ctx); err != nil {
return err // Context cancelled
}
// Proceed with API call
This ensures that your operator self-throttles its requests, behaving as a good citizen and preventing failures due to rate limiting.
Conclusion
Finalizers are not an optional feature for a production-grade stateful operator; they are a fundamental requirement. They transform the operator from a simple state-synchronizer into a true lifecycle manager. By intercepting the deletion flow, you can guarantee that external, costly, and stateful resources are deprovisioned cleanly, preventing resource leaks and security gaps.
A robust implementation requires more than just adding and removing a string from metadata. It demands idempotent logic, careful error handling to distinguish between transient failures and pending states, strategies for dealing with stuck resources, and an awareness of the performance implications of concurrency and API rate limiting. By mastering these patterns, you can build operators that are not just powerful but also safe, reliable, and production-ready.