Kubernetes Operators: Advanced Finalizer Patterns for Stateful Resources
The Unspoken Contract of Stateful Operators
As senior engineers, we've embraced Kubernetes as more than a container orchestrator; it's a universal control plane. We model our infrastructure—databases, message queues, DNS entries, even SaaS subscriptions—as Custom Resources (CRs). The Operator pattern, with its declarative API and reconciliation loop, provides a powerful abstraction. But this power comes with a critical responsibility: when a user executes kubectl delete mydatabase db-prod, the Operator must uphold a contract to not just remove the CR from etcd, but to meticulously tear down the real-world, stateful resource it represents.
Failure to do so results in orphaned infrastructure—ghost RDS instances racking up bills, dangling DNS records pointing to nothing, and abandoned storage buckets creating security vulnerabilities. The default Kubernetes garbage collection is insufficient for these external resources. The bridge between a CR's lifecycle and the external world's lifecycle is a surprisingly simple yet profoundly important Kubernetes feature: the Finalizer.
This article assumes you understand what an Operator is and have perhaps built a basic one using kubebuilder or operator-sdk. We will not cover the basics. Instead, we will focus exclusively on the advanced, production-grade implementation of finalizer logic to build resilient, leak-proof operators.
Anatomy of a Deletion Failure: The 'Why' of Finalizers
To appreciate the necessity of finalizers, let's first witness a catastrophic failure in an operator that lacks them. Imagine a simple CloudDatabase operator managing an AWS RDS instance.
The Flawed Reconcile Loop (Without Finalizer):
// DO NOT USE THIS CODE - IT IS INTENTIONALLY FLAWED
func (r *CloudDatabaseReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
log := log.FromContext(ctx)
var cloudDB customv1.CloudDatabase
if err := r.Get(ctx, req.NamespacedName, &cloudDB); err != nil {
if errors.IsNotFound(err) {
// CR was deleted. But what about the RDS instance?
// By the time we get this 'NotFound' error, it's too late.
// We no longer have the CR's spec to know which RDS instance to delete.
log.Info("CloudDatabase resource not found. Assuming it was deleted.")
return ctrl.Result{}, nil
}
return ctrl.Result{}, err
}
// ... logic to create or update the RDS instance based on cloudDB.Spec ...
// This part of the code never gets a chance to run the deletion logic.
return ctrl.Result{}, nil
}
Here's the sequence of events when a user runs kubectl delete clouddatabase my-prod-db:
- The Kubernetes API server receives the delete request.
CloudDatabase object in etcd by setting its metadata.deletionTimestamp.etcd key space. The object is now only accessible via specific API calls that can see "deleted" objects for a short grace period.DELETED event.my-prod-db.r.Get() call inside our Reconcile function returns a NotFound error because the object is gone from the main API view.- The reconciler logs a message and returns, assuming its job is done.
The result: The CloudDatabase CR is gone, but the expensive RDS instance in AWS is still running, completely orphaned. The Operator has lost all information about it (spec.instanceID, spec.region, etc.) and has no way to clean it up. This is the fundamental problem that finalizers solve.
Core Implementation: The Finalizer Gatekeeper
A finalizer is a string key added to a resource's metadata.finalizers list. When present, it acts as a gatekeeper. The Kubernetes API server will not fully delete a resource that has finalizers. Instead, it sets the metadata.deletionTimestamp and leaves the object in a Terminating state, making it visible to the controller. It is the controller's explicit duty to perform cleanup and then remove its own finalizer. Only when the finalizer list is empty will the API server complete the deletion.
Let's refactor our reconciler to use this pattern correctly. We'll use the controller-runtime/pkg/controller/controllerutil package, which provides helpers for this.
package controllers
import (
// ... other imports
"sigs.k8s.io/controller-runtime/pkg/controller/controllerutil"
)
const cloudDatabaseFinalizer = "database.example.com/finalizer"
func (r *CloudDatabaseReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
log := log.FromContext(ctx)
var cloudDB customv1.CloudDatabase
if err := r.Get(ctx, req.NamespacedName, &cloudDB); err != nil {
if errors.IsNotFound(err) {
log.Info("CloudDatabase resource not found. Ignoring since object must be deleted.")
return ctrl.Result{}, nil
}
log.Error(err, "Failed to get CloudDatabase")
return ctrl.Result{}, err
}
// Check if the instance is being deleted
if !cloudDB.ObjectMeta.DeletionTimestamp.IsZero() {
// The object is being deleted
if controllerutil.ContainsFinalizer(&cloudDB, cloudDatabaseFinalizer) {
// Our finalizer is present, so let's handle external dependency cleanup.
if err := r.deleteExternalResources(ctx, &cloudDB); err != nil {
// If cleanup fails, we return an error so the reconciliation is retried.
// We don't remove the finalizer, so the object is not deleted yet.
log.Error(err, "Failed to delete external resources")
return ctrl.Result{}, err
}
// Cleanup was successful. Remove our finalizer.
controllerutil.RemoveFinalizer(&cloudDB, cloudDatabaseFinalizer)
if err := r.Update(ctx, &cloudDB); err != nil {
return ctrl.Result{}, err
}
}
// Stop reconciliation as the item is being deleted
return ctrl.Result{}, nil
}
// The object is not being deleted, so we add our finalizer if it doesn't exist.
if !controllerutil.ContainsFinalizer(&cloudDB, cloudDatabaseFinalizer) {
controllerutil.AddFinalizer(&cloudDB, cloudDatabaseFinalizer)
if err := r.Update(ctx, &cloudDB); err != nil {
return ctrl.Result{}, err
}
}
// ... your normal reconciliation logic for creating/updating the RDS instance ...
// e.g., r.ensureLatestRDS(ctx, &cloudDB)
return ctrl.Result{}, nil
}
// deleteExternalResources handles the actual cleanup in AWS, GCP, etc.
func (r *CloudDatabaseReconciler) deleteExternalResources(ctx context.Context, cloudDB *customv1.CloudDatabase) error {
// This is where you would put your cloud provider API calls.
// For example, using the AWS SDK for Go v2:
// _, err := r.rdsClient.DeleteDBInstance(ctx, &rds.DeleteDBInstanceInput{...})
// It is CRITICAL that this function is IDEMPOTENT.
log := log.FromContext(ctx)
log.Info("Deleting external RDS instance", "instanceID", cloudDB.Status.InstanceID)
// Simulate a successful deletion
// In a real implementation, you would poll until the instance is truly gone.
time.Sleep(2 * time.Second)
log.Info("External RDS instance deleted successfully")
return nil
}
This structure forms the backbone of any reliable operator. The key flows are:
DeletionTimestamp is set, we check for our finalizer. If present, we run our cleanup logic. Only upon successful cleanup do we remove the finalizer and update the object. The API server then completes the deletion.Advanced Pattern 1: Idempotent Cleanup and Status Reporting
The simple deleteExternalResources function above is naive. What happens if the operator pod crashes right after making the DeleteDBInstance API call but before it can remove the finalizer? On restart, the reconciliation will trigger again for the same Terminating CR.
Your cleanup logic must be idempotent. Calling it a second time on an already-deleted or in-progress-deletion resource should not result in an error. Real-world cloud APIs often help here.
Let's implement a more robust, idempotent deletion function for an AWS RDS instance.
import (
"github.com/aws/aws-sdk-go-v2/service/rds"
"github.com/aws/aws-sdk-go-v2/service/rds/types"
awserrors "github.com/aws/smithy-go/transport/http"
)
func (r *CloudDatabaseReconciler) deleteExternalResources(ctx context.Context, cloudDB *customv1.CloudDatabase) error {
log := log.FromContext(ctx)
if cloudDB.Status.InstanceID == "" {
log.Info("InstanceID not set in status, assuming external resource was never created.")
return nil
}
input := &rds.DeleteDBInstanceInput{
DBInstanceIdentifier: &cloudDB.Status.InstanceID,
SkipFinalSnapshot: true, // For production, you might want to make this configurable
}
log.Info("Attempting to delete RDS instance", "instanceID", cloudDB.Status.InstanceID)
_, err := r.rdsClient.DeleteDBInstance(ctx, input)
if err != nil {
// Check if the error is DBInstanceNotFound. If so, it's already gone.
var notFoundErr *types.DBInstanceNotFoundFault
if errors.As(err, ¬FoundErr) {
log.Info("RDS instance already deleted. Cleanup is considered successful.")
return nil
}
// Another potential transient error during deletion
var invalidStateErr *types.InvalidDBInstanceStateFault
if errors.As(err, &invalidStateErr) {
log.Info("RDS instance is in an invalid state for deletion, will retry", "message", invalidStateErr.ErrorMessage())
// We return an error to trigger a requeue. Kubernetes' exponential backoff will handle retries.
return fmt.Errorf("RDS instance in invalid state: %w", err)
}
log.Error(err, "Failed to delete RDS instance")
return err
}
// After calling delete, it's best practice to wait until it's actually gone.
log.Info("Waiting for RDS instance to be fully terminated...")
waiter := rds.NewDBInstanceDeletedWaiter(r.rdsClient)
err = waiter.Wait(ctx, &rds.DescribeDBInstancesInput{DBInstanceIdentifier: &cloudDB.Status.InstanceID}, 5*time.Minute)
if err != nil {
// If the waiter fails because the instance is not found, that's our success condition!
var notFoundErr *types.DBInstanceNotFoundFault
if errors.As(err, ¬FoundErr) {
log.Info("RDS instance confirmed deleted.")
return nil
}
return fmt.Errorf("error while waiting for instance deletion: %w", err)
}
log.Info("RDS instance fully terminated.")
return nil
}
This implementation is far more robust:
Status.InstanceID would be empty, and we can exit gracefully.DBInstanceNotFoundFault. If the RDS instance is already gone, it treats this as a success, preventing the reconcile loop from getting stuck.InvalidDBInstanceStateFault, which might occur if AWS is still modifying the instance. By returning an error, we let controller-runtime's exponential backoff handle the retry.Waiter to block until the resource is truly gone, ensuring we don't remove the finalizer prematurely.Advanced Pattern 2: State Machines for Multi-Resource Cleanup
Real-world operators often manage a constellation of related resources. A single WebApp CR might create:
- An IAM Role for permissions.
- An S3 Bucket for static assets.
- A CloudFront Distribution pointing to the S3 bucket.
CNAME record.These have deletion dependencies. You must delete the CloudFront distribution before the S3 bucket it uses. You must detach policies before deleting an IAM role.
A common but flawed approach is to use multiple finalizers (finalizers: ["iam.finalizer", "s3.finalizer", "cloudfront.finalizer"]). This is an anti-pattern because Kubernetes offers no ordering guarantees for finalizer processing if multiple controllers are watching the same resource. You can't control which one runs first.
The superior pattern is to use a single finalizer and an internal state machine, tracked in the CR's status.
Let's design the status subresource for our WebApp CR:
# In your CRD's OpenAPI v3 schema
status:
type: object
properties:
conditions:
type: array
items: ... # Standard Kubernetes conditions
cleanupState:
type: string
enum: ["", "DeletingDistribution", "DeletingBucket", "DeletingRole", "Done"]
Now, our deletion logic becomes a state machine:
const webAppFinalizer = "webapp.example.com/finalizer"
// Deletion state constants
const (
StateDeletingDistribution = "DeletingDistribution"
StateDeletingBucket = "DeletingBucket"
StateDeletingRole = "DeletingRole"
StateDone = "Done"
)
func (r *WebAppReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
// ... initial setup and fetch WebApp instance ...
if !instance.ObjectMeta.DeletionTimestamp.IsZero() {
if controllerutil.ContainsFinalizer(instance, webAppFinalizer) {
if err := r.runCleanupStateMachine(ctx, instance); err != nil {
// Update status with error condition before returning
return ctrl.Result{}, err
}
// If state machine is complete, remove finalizer
if instance.Status.CleanupState == StateDone {
controllerutil.RemoveFinalizer(instance, webAppFinalizer)
if err := r.Update(ctx, instance); err != nil {
return ctrl.Result{}, err
}
}
}
return ctrl.Result{}, nil
}
// ... normal reconcile logic with finalizer addition ...
return ctrl.Result{}, nil
}
func (r *WebAppReconciler) runCleanupStateMachine(ctx context.Context, instance *customv1.WebApp) error {
currentState := instance.Status.CleanupState
if currentState == "" {
currentState = StateDeletingDistribution // Start state
}
var nextState string
var err error
switch currentState {
case StateDeletingDistribution:
log.Info("Cleanup state: Deleting CloudFront Distribution")
err = r.deleteCloudFront(ctx, instance)
nextState = StateDeletingBucket
case StateDeletingBucket:
log.Info("Cleanup state: Deleting S3 Bucket")
err = r.deleteS3Bucket(ctx, instance)
nextState = StateDeletingRole
case StateDeletingRole:
log.Info("Cleanup state: Deleting IAM Role")
err = r.deleteIAMRole(ctx, instance)
nextState = StateDone
case StateDone:
log.Info("Cleanup state machine complete.")
return nil
default:
return fmt.Errorf("unknown cleanup state: %s", currentState)
}
if err != nil {
// If a step fails, we record the error but don't advance the state.
// The reconciler will retry the same step.
// You should update a Condition in the status here.
return fmt.Errorf("failed at state %s: %w", currentState, err)
}
// Advance the state
instance.Status.CleanupState = nextState
if updateErr := r.Status().Update(ctx, instance); updateErr != nil {
return fmt.Errorf("failed to update status to %s: %w", nextState, updateErr)
}
// We successfully advanced the state, so we immediately requeue to run the next step.
return ctrl.Result{Requeue: true}, nil
}
This state machine pattern provides several advantages:
* Ordered Execution: Guarantees deletion steps run in the correct sequence.
* Observability: Anyone running kubectl describe webapp my-app can see exactly where in the cleanup process it is.
* Resilience: If the operator restarts, it reads the cleanupState from the status and resumes exactly where it left off.
* Atomic Steps: Each step is attempted, and only on success does the state advance. A failure in one step causes it to be retried without affecting others.
Edge Case Deep Dive: The Stuck `Terminating` State
Every seasoned Kubernetes administrator has encountered a namespace or CR stuck in the Terminating state for hours or days. This is almost always caused by a faulty finalizer process.
Common Causes:
The dangerous, yet common, manual fix is to kubectl patch the resource to remove the finalizer array. This immediately orphans the external resources and should be a last resort.
Architectural Solutions:
deletionTimestamp older than a reasonable threshold (e.g., 30 minutes). This is a strong signal that your operator is failing its cleanup duties.kubernetes.io/cr-uid: "a1b2c3d4-....".This enables a separate, out-of-band garbage collection process (e.g., a scheduled AWS Lambda function) that can:
* Scan for all resources with that tag.
* For each found resource, query the Kubernetes API server to see if a CR with that UID still exists.
If the CR does not* exist, the external resource is an orphan and can be safely deleted.
This provides a crucial safety net against operator bugs or manual intervention that leads to orphaned resources.
ValidatingAdmissionWebhook to prevent the deletion of your Operator's own Deployment if there are any existing CRs of its kind in the cluster. This forces an administrator to delete the CRs (and let the operator clean them up) before deleting the operator itself.Performance and Concurrency
By default, a controller-runtime controller processes one work item at a time. If a user deletes 1,000 CRs, they will be cleaned up serially. If each cleanup takes 10 seconds, the last CR won't be deleted for nearly 3 hours.
You can increase parallelism by setting MaxConcurrentReconciles when setting up the controller:
// In your main.go
if err = (&controllers.CloudDatabaseReconciler{ ... }).SetupWithManager(mgr, controller.Options{
MaxConcurrentReconciles: 10, // Process up to 10 reconciles concurrently
}); err != nil {
// ...
}
However, this can lead to a thundering herd problem against your external API. If 10 concurrent reconciles all try to delete an RDS instance, you might hit AWS API rate limits.
The solution is to build rate limiting into the client your operator uses to talk to the external service.
import "golang.org/x/time/rate"
// A custom http.RoundTripper to wrap the AWS client's transport
type RateLimitedTransport struct {
Transport http.RoundTripper
Limiter *rate.Limiter
}
func (t *RateLimitedTransport) RoundTrip(req *http.Request) (*http.Response, error) {
if err := t.Limiter.Wait(req.Context()); err != nil {
return nil, err
}
return t.Transport.RoundTrip(req)
}
// When creating your AWS config
limiter := rate.NewLimiter(rate.Limit(5), 10) // Allow 5 events per second, with a burst of 10
customHttpClient := &http.Client{
Transport: &RateLimitedTransport{
Transport: http.DefaultTransport,
Limiter: limiter,
},
}
cfg, err := config.LoadDefaultConfig(context.TODO(), config.WithHTTPClient(customHttpClient))
// ... create rdsClient from this cfg
By combining increased MaxConcurrentReconciles with client-side rate limiting, you can achieve high throughput cleanup without overwhelming downstream APIs.
Conclusion: Masters of the Lifecycle
Finalizers transform an Operator from a simple resource creator into a true lifecycle manager. They are the covenant that ensures what happens in Kubernetes is faithfully and safely reflected in the outside world. By moving beyond the basic implementation and embracing patterns for idempotency, stateful cleanup, and robust error handling, you can build operators that are not just powerful, but also production-ready and trustworthy.
The stuck Terminating resource is not a Kubernetes bug; it's a signal. It's the system telling you that a controller has failed to fulfill its contract. By understanding and mastering the advanced finalizer patterns discussed here, you'll be equipped to build operators that always honor their commitments, ensuring a clean, consistent, and cost-effective infrastructure control plane.