Kubernetes Finalizers: Advanced Patterns for Stateful Teardown
The Deletion Fallacy: Why Standard Garbage Collection Fails External Resources
As a seasoned Kubernetes engineer, you understand the power of the declarative model. You define a desired state in a Custom Resource (CR), and your operator's reconciliation loop makes it a reality. But what happens when that reality extends beyond the cluster's API server? Consider an operator managing S3Bucket custom resources. When a developer executes kubectl delete s3bucket my-production-bucket, Kubernetes dutifully removes the S3Bucket object from etcd. The problem? The actual S3 bucket in AWS remains, now an orphaned, potentially costly resource.
This is the core challenge that standard Kubernetes garbage collection, primarily designed around OwnerReferences for in-cluster objects, cannot solve. The controller manager has no inherent knowledge of the external world. Deleting the CR object is a fire-and-forget operation from its perspective.
This is where Finalizers become a non-negotiable component of any production-grade operator managing external state. A finalizer is simply a string key added to an object's metadata.finalizers list. Its presence acts as a lock, signaling to the Kubernetes API server: "Do not fully delete this object yet. A controller is performing pre-delete cleanup tasks."
When a user requests deletion of an object with a finalizer, the API server doesn't immediately remove it. Instead, it populates the object's metadata.deletionTimestamp field. This is the signal for our operator. The object now exists in a read-only, terminating state. Our reconciliation loop detects this timestamp, executes the necessary external cleanup logic (e.g., deleting the S3 bucket via the AWS API), and only upon successful completion, removes its finalizer string from the list. Once the finalizers list is empty, Kubernetes completes the object's deletion.
This article moves beyond this high-level concept and dives into the intricate implementation details, edge cases, and performance considerations you'll face when building robust, stateful operators in Go with Kubebuilder.
Part 1: A Canonical Finalizer Implementation in Go
Let's build the core logic for an S3Bucket controller. We'll assume a standard Kubebuilder project setup. The heart of our logic resides within the Reconcile function.
Our CRD spec might look like this:
# config/crd/bases/cloud.my.domain_s3buckets.yaml
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: s3buckets.cloud.my.domain
spec:
group: cloud.my.domain
names:
kind: S3Bucket
listKind: S3BucketList
plural: s3buckets
singular: s3bucket
scope: Namespaced
versions:
- name: v1alpha1
schema:
openAPIV3Schema:
type: object
properties:
spec:
type: object
properties:
bucketName:
type: string
region:
type: string
isPublic:
type: boolean
required:
- bucketName
- region
The core of the controller logic is a two-pronged approach within the Reconcile method, keyed off the presence of the DeletionTimestamp.
// controllers/s3bucket_controller.go
import (
// ... other imports
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/client"
"sigs.k8s.io/controller-runtime/pkg/controller/controllerutil"
"sigs.k8s.io/controller-runtime/pkg/log"
cloudv1alpha1 "github.com/my-org/s3-operator/api/v1alpha1"
)
// A constant for our finalizer name. This is convention and makes the code cleaner.
const s3BucketFinalizer = "cloud.my.domain/finalizer"
func (r *S3BucketReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
logger := log.FromContext(ctx)
// 1. Fetch the S3Bucket instance
s3Bucket := &cloudv1alpha1.S3Bucket{}
if err := r.Get(ctx, req.NamespacedName, s3Bucket); err != nil {
if client.IgnoreNotFound(err) != nil {
logger.Error(err, "unable to fetch S3Bucket")
return ctrl.Result{}, err
}
logger.Info("S3Bucket resource not found. Ignoring since object must be deleted")
return ctrl.Result{}, nil
}
// 2. The core finalizer logic branch
if s3Bucket.ObjectMeta.DeletionTimestamp.IsZero() {
// The object is NOT being deleted, so we add our finalizer if it doesn't exist.
if !controllerutil.ContainsFinalizer(s3Bucket, s3BucketFinalizer) {
logger.Info("Adding finalizer to S3Bucket")
controllerutil.AddFinalizer(s3Bucket, s3BucketFinalizer)
if err := r.Update(ctx, s3Bucket); err != nil {
return ctrl.Result{}, err
}
}
// This is where your normal reconciliation logic goes.
// e.g., check if the S3 bucket exists, if not, create it.
// We'll stub this out for now.
if err := r.ensureS3BucketExists(ctx, s3Bucket); err != nil {
// Update status, etc.
return ctrl.Result{}, err
}
} else {
// The object IS being deleted
if controllerutil.ContainsFinalizer(s3Bucket, s3BucketFinalizer) {
logger.Info("Performing cleanup for S3Bucket")
// Our actual finalizer logic
if err := r.handleFinalizer(ctx, s3Bucket); err != nil {
// If the cleanup fails, we don't remove the finalizer.
// This will cause the reconciliation to be re-queued, and we'll try again.
logger.Error(err, "failed to handle finalizer")
return ctrl.Result{}, err
}
// Cleanup was successful, remove the finalizer
logger.Info("Removing finalizer from S3Bucket after successful cleanup")
controllerutil.RemoveFinalizer(s3Bucket, s3BucketFinalizer)
if err := r.Update(ctx, s3Bucket); err != nil {
return ctrl.Result{}, err
}
}
}
return ctrl.Result{}, nil
}
// handleFinalizer contains the actual logic to clean up the external resource.
func (r *S3BucketReconciler) handleFinalizer(ctx context.Context, s3Bucket *cloudv1alpha1.S3Bucket) error {
logger := log.FromContext(ctx)
// This is a placeholder for your AWS SDK call
logger.Info("Attempting to delete external S3 bucket", "bucketName", s3Bucket.Spec.BucketName)
// Use the AWS SDK to check if the bucket exists and delete it.
// This logic MUST be idempotent.
exists, err := r.S3Client.BucketExists(ctx, s3Bucket.Spec.BucketName)
if err != nil {
return fmt.Errorf("failed to check if S3 bucket exists: %w", err)
}
if exists {
if err := r.S3Client.DeleteBucket(ctx, s3Bucket.Spec.BucketName); err != nil {
// Important: Return an error here to trigger a requeue.
return fmt.Errorf("failed to delete S3 bucket: %w", err)
}
logger.Info("Successfully deleted external S3 bucket")
} else {
logger.Info("External S3 bucket already deleted, nothing to do.")
}
return nil
}
// ensureS3BucketExists is the placeholder for the normal reconciliation logic.
func (r *S3BucketReconciler) ensureS3BucketExists(ctx context.Context, s3Bucket *cloudv1alpha1.S3Bucket) error {
// Placeholder: check if bucket exists, if not create it.
// ...
return nil
}
Analysis of the Pattern
cloud.my.domain/finalizer prevents collisions with other controllers that might be operating on the same object.DeletionTimestamp.IsZero(): This is the canonical way to check if an object is undergoing deletion. If it's zero, the object is alive. If it's non-zero, deletion has been initiated.controllerutil Helpers: The controller-runtime library provides ContainsFinalizer, AddFinalizer, and RemoveFinalizer. These helpers simplify the otherwise tedious slice manipulation of the ObjectMeta.Finalizers field.Update(). Reconcile. Do work. Remove finalizer, then Update(). Each change to the finalizer list must be persisted to the API server. If the Update() call fails after adding the finalizer, the next reconciliation will simply re-verify that it's present and continue.else block, if handleFinalizer() returns an error, we do not remove the finalizer. We return the error to controller-runtime, which will requeue the request. The object will remain in a Terminating state until our cleanup logic succeeds.Part 2: Advanced Edge Cases and Production Hardening
The simple implementation above works for the happy path. Production environments are anything but. Let's explore the critical edge cases.
Edge Case 1: Idempotency in Cleanup Logic
Your reconciliation loop can be triggered multiple times for the same deletion event due to cluster state changes or requeues. Your cleanup logic must be idempotent.
Problem: If your handleFinalizer function blindly tries to delete the S3 bucket, the second time it runs (after a requeue), the AWS API will return a NoSuchBucket error. If you treat this as a fatal error, your controller will get stuck in a perpetual retry loop, never removing the finalizer.
Solution: The cleanup logic must first check for the existence of the external resource. If it doesn't exist, the cleanup should be considered a success.
Our handleFinalizer already demonstrates this:
// ... inside handleFinalizer
exists, err := r.S3Client.BucketExists(ctx, s3Bucket.Spec.BucketName)
// ... error handling ...
if exists {
// ... attempt deletion ...
} else {
// Already gone, this is a success condition for the finalizer.
logger.Info("External S3 bucket already deleted, nothing to do.")
}
return nil
This ensures that even if the reconciliation loop runs ten times during the deletion process, it will only attempt the API call once and will correctly report success on subsequent runs, allowing the finalizer to be removed.
Edge Case 2: Handling Cleanup Failures and Requeue Strategy
What if the AWS API is down or returns a transient error (e.g., ThrottlingException)?
Problem: Simply returning err from Reconcile triggers controller-runtime's default exponential backoff retry mechanism. While this is good, sometimes you need more control, especially for known transient issues.
Solution: Implement a more nuanced requeue strategy. Instead of just returning the error, you can inspect it and decide whether to requeue immediately or after a specific delay.
// ... inside the deletion branch
if err := r.handleFinalizer(ctx, s3Bucket); err != nil {
logger.Error(err, "failed to handle finalizer")
// Example: Check for a specific, transient AWS error
var throttlingErr *types.ThrottlingException
if errors.As(err, &throttlingErr) {
logger.Info("AWS API is throttling. Requeuing after 30 seconds.")
// Return a result object to requeue after a specific delay
return ctrl.Result{RequeueAfter: 30 * time.Second}, nil
}
// For other errors, use the default backoff
return ctrl.Result{}, err
}
This gives you fine-grained control over the retry loop, preventing your operator from hammering a struggling downstream API while still ensuring eventual consistency.
Edge Case 3: The "Stuck" Finalizer and Manual Intervention
Problem: A bug in your controller or a permanent external issue (e.g., credentials revoked) could cause the handleFinalizer to fail indefinitely. The CR will be stuck in the Terminating state forever, and kubectl delete --force won't work.
This is a feature, not a bug. It prevents data loss or orphaned resources. However, an administrator needs a "break glass" procedure.
Solution: The administrator must manually patch the object to remove the finalizer. This is a dangerous operation that should be performed only when the operator is confirmed to be non-functional and the external resource has been cleaned up manually.
The Command:
# Get the current object YAML to see the finalizers
kubectl get s3bucket my-stuck-bucket -n my-namespace -o yaml
# Manually patch the object to remove the finalizer
# Replace 'cloud.my.domain/finalizer' with your actual finalizer name
kubectl patch s3bucket my-stuck-bucket -n my-namespace --type 'json' -p='[{"op': 'remove', 'path': '/metadata/finalizers/0'}]'
# Note: The index '0' assumes your finalizer is the first in the list.
# You might need to adjust this or remove the entire array if it's the only one.
# A safer way to remove a specific finalizer from a list:
# kubectl patch s3bucket my-stuck-bucket -n my-namespace --type=merge -p '{"metadata":{"finalizers":[]}}'
# (This removes ALL finalizers, be careful)
Your operator's documentation must include this procedure and clearly state the risks involved. It's also wise to add metrics and alerts to detect CRs that have been in a Terminating state for an extended period (e.g., > 1 hour).
Edge Case 4: Race Conditions with Spec Updates
Problem: What happens if a user updates the spec of a CR (e.g., s3Bucket.Spec.BucketName) at the exact moment another user requests its deletion?
Solution: The Kubernetes API server and the controller-runtime framework handle this gracefully. When a deletion is requested, the deletionTimestamp is set. Any subsequent UPDATE operations on the object will be rejected by the API server, except for updates to metadata (like removing a finalizer) and status.
This means your reconciliation logic doesn't need to worry about the spec changing mid-cleanup. Once the deletionTimestamp is set, the spec is effectively frozen. Your handleFinalizer can safely read from s3Bucket.Spec.BucketName knowing it's the version that existed at the time of deletion.
Part 3: Performance and Scalability
In a large-scale environment with thousands of CRs, the performance of your finalizer logic can become a bottleneck.
Controller Concurrency
The controller manager runs with a default MaxConcurrentReconciles. Let's say it's set to 10. If your handleFinalizer function takes 5 seconds to complete an AWS API call, and 10 CRs are deleted simultaneously, all 10 reconciliation workers will be busy for 5 seconds. No other S3Bucket CRs (being created, updated, or deleted) can be reconciled during this time.
Considerations:
* Long-Running Cleanup: If your cleanup involves a complex, multi-step process, consider an asynchronous pattern. The handleFinalizer could create a Kubernetes Job to perform the cleanup and then immediately return. The operator would then watch for the Job's completion status before removing the finalizer. This frees up the reconciler worker immediately.
* External API Rate Limiting: If 1000 CRs are deleted at once, your operator might make 1000 simultaneous API calls to your cloud provider, triggering rate limiting. Implement client-side rate limiting in your external API client (e.g., using a token bucket algorithm) or serialize cleanup operations through a dedicated queue within the operator.
Example: Asynchronous Cleanup with Status Updates
To make the process more observable, we can update the CR's status subresource during finalization.
First, add a status to your CRD:
// api/v1alpha1/s3bucket_types.go
type S3BucketStatus struct {
// +optional
State string `json:"state,omitempty"`
// +optional
Message string `json:"message,omitempty"`
}
// ... in S3Bucket struct
Status S3BucketStatus `json:"status,omitempty"`
Now, update the finalizer handler to reflect the state:
// ... inside the deletion branch
if controllerutil.ContainsFinalizer(s3Bucket, s3BucketFinalizer) {
// Update status to indicate cleanup has started
s3Bucket.Status.State = "Terminating"
s3Bucket.Status.Message = "Removing external resources"
if err := r.Status().Update(ctx, s3Bucket); err != nil {
// Even if status update fails, we should proceed with cleanup
logger.Error(err, "failed to update S3Bucket status during finalization")
}
if err := r.handleFinalizer(ctx, s3Bucket); err != nil {
// On failure, update status again
s3Bucket.Status.Message = fmt.Sprintf("Cleanup failed: %v", err)
_ = r.Status().Update(ctx, s3Bucket) // Best effort update
return ctrl.Result{}, err
}
// ... remove finalizer ...
}
This provides invaluable observability for platform administrators. When they see a resource stuck in Terminating, they can kubectl describe it and immediately see the error message from the last failed cleanup attempt in its status.
Conclusion: Finalizers as a Contract
Implementing finalizers correctly transforms your operator from a simple resource creator into a true lifecycle manager. It establishes a contract between your controller and the Kubernetes API server, ensuring that your business logic is an integral part of the object's deletion flow.
For senior engineers, mastering this pattern is not optional; it is the cornerstone of building reliable, production-ready operators that can be trusted with critical infrastructure. The key takeaways are:
DeletionTimestamp: This is the universal signal to switch from normal reconciliation to cleanup mode.By internalizing these advanced patterns, you can build controllers that safely and robustly manage the entire lifecycle of any resource, whether it lives inside or outside your Kubernetes cluster.