Kubernetes Finalizers: A Deep Dive into Stateful Operator Cleanup
The Deletion Paradox in Declarative Systems
In the world of Kubernetes, the control loop's primary directive is to make the actual state of the world match the desired state defined in a resource's manifest. When you create a Deployment, the controller creates ReplicaSets and Pods. When you update it, the controller performs a rolling update. But what happens when you delete it? Kubernetes garbage collection is ruthlessly efficient: it removes the object from etcd, and cascaded deletes handle the owned resources. This works perfectly for resources that live entirely within the cluster.
However, for a custom operator managing resources outside of Kubernetes—an AWS RDS instance, a GCP Cloud SQL database, a Stripe subscription, or a simple S3 bucket—this immediate deletion is a critical flaw. The operator's link to the external resource is the Custom Resource (CR). If the CR is deleted instantly, the operator loses the information needed to clean up the external resource, leaving it orphaned and incurring costs.
This is where Finalizers become not just a feature, but an architectural necessity. A finalizer is a key in a resource's metadata that tells the Kubernetes API server to block deletion until a specific controller has completed its cleanup tasks. It effectively transforms a fire-and-forget deletion request into a two-phase process: a graceful shutdown signal followed by the actual deletion.
This article assumes you are familiar with the Operator pattern, the basics of the reconciliation loop (Reconcile function), and are comfortable with Go. We will focus exclusively on the advanced implementation patterns for using finalizers to build resilient, production-grade operators.
Dissecting the Finalizer-Aware Reconciliation Loop
The introduction of a finalizer fundamentally alters the state machine of the reconciliation loop. A standard loop primarily cares about Create and Update events. A finalizer-aware loop introduces a critical Deleting state.
Let's visualize the lifecycle of a Custom Resource (S3Bucket) managed by an operator using a finalizer, say s3.my-company.com/finalizer.
S3Bucket CR named my-test-bucket. * The operator's Reconcile function is triggered.
* It checks metadata.finalizers. The list does not contain s3.my-company.com/finalizer.
Action: The operator adds its finalizer to the list and issues an Update call on the CR. This is the first* and most critical step. The external resource has not been created yet.
* The update triggers another reconciliation. This time, the finalizer is present, and the metadata.deletionTimestamp is nil. The operator proceeds with normal resource creation logic (e.g., calls the AWS API to create the S3 bucket).
kubectl delete s3bucket my-test-bucket.* The Kubernetes API server receives the delete request.
* It inspects the CR and sees the s3.my-company.com/finalizer in its metadata.
* Action: Instead of deleting the object from etcd, it sets the metadata.deletionTimestamp to the current time. The object is now in a Terminating state.
* Setting the deletionTimestamp is an update event, so the operator's Reconcile function is triggered again.
* Inside the loop, the first check is now: if object.GetDeletionTimestamp() != nil.
* This condition is now true. The operator knows it must execute its cleanup logic.
* Action: The operator calls the AWS API to delete the S3 bucket associated with this CR.
* The cleanup logic must be idempotent. If it fails, the reconciliation will be re-queued, and the logic will run again. It must handle the case where the external resource is already gone.
* Once the external resource is confirmed to be deleted, the operator removes its finalizer (s3.my-company.com/finalizer) from the metadata.finalizers list and issues a final Update call on the CR.
* The Kubernetes API server sees this final update.
* It checks the object again: deletionTimestamp is set, and the finalizers list is now empty.
* Action: The API server now proceeds with the final deletion, removing the object from etcd.
This sequence ensures that the operator maintains control until all external dependencies are cleanly resolved.
Production Implementation: An S3 Bucket Operator
Let's translate this theory into a production-grade implementation using Go, the Operator SDK, and the AWS SDK for Go v2. We'll build an S3Bucket operator.
Prerequisites: Your environment is set up with the Operator SDK, and you have an AWS account with credentials configured for the operator to use.
1. The Custom Resource Definition (CRD)
First, define the API for our S3Bucket resource in api/v1/s3bucket_types.go.
// api/v1/s3bucket_types.go
package v1
import (
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
// S3BucketSpec defines the desired state of S3Bucket
type S3BucketSpec struct {
// BucketName is the name of the S3 bucket to create.
// +kubebuilder:validation:Required
// +kubebuilder:validation:MinLength=3
BucketName string `json:"bucketName"`
// Region is the AWS region where the bucket will be created.
// +kubebuilder:validation:Required
Region string `json:"region"`
}
// S3BucketStatus defines the observed state of S3Bucket
type S3BucketStatus struct {
// URL is the full URL of the created S3 bucket.
URL string `json:"url,omitempty"`
// State represents the current state of the bucket (e.g., "Created", "Error").
State string `json:"state,omitempty"`
// Message provides more details about the current state or any errors.
Message string `json:"message,omitempty"`
}
//+kubebuilder:object:root=true
//+kubebuilder:subresource:status
//+kubebuilder:printcolumn:name="BucketName",type="string",JSONPath=".spec.bucketName"
//+kubebuilder:printcolumn:name="Status",type="string",JSONPath=".status.state"
//+kubebuilder:printcolumn:name="Age",type="date",JSONPath=".metadata.creationTimestamp"
// S3Bucket is the Schema for the s3buckets API
type S3Bucket struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`
Spec S3BucketSpec `json:"spec,omitempty"`
Status S3BucketStatus `json:"status,omitempty"`
}
//+kubebuilder:object:root=true
// S3BucketList contains a list of S3Bucket
type S3BucketList struct {
metav1.TypeMeta `json:",inline"`
metav1.ListMeta `json:"metadata,omitempty"`
Items []S3Bucket `json:"items"`
}
func init() {
SchemeBuilder.Register(&S3Bucket{}, &S3BucketList{})
}
2. The Controller Logic (`s3bucket_controller.go`)
This is where the core finalizer logic resides. We'll structure the Reconcile function to handle the different states we discussed.
First, set up the reconciler struct and the finalizer name constant.
// controllers/s3bucket_controller.go
package controllers
import (
"context"
"fmt"
"time"
"github.com/aws/aws-sdk-go-v2/aws"
awsconfig "github.com/aws/aws-sdk-go-v2/config"
"github.com/aws/aws-sdk-go-v2/service/s3"
"github.com/aws/aws-sdk-go-v2/service/s3/types"
"k8s.io/apimachinery/pkg/api/errors"
"k8s.io/apimachinery/pkg/runtime"
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/client"
"sigs.k8s.io/controller-runtime/pkg/controller/controllerutil"
"sigs.k8s.io/controller-runtime/pkg/log"
cachev1 "my-operator/api/v1"
)
const s3BucketFinalizer = "s3.my-company.com/finalizer"
// S3BucketReconciler reconciles a S3Bucket object
type S3BucketReconciler struct {
client.Client
Scheme *runtime.Scheme
}
// A real implementation would have a more robust way to manage S3 clients per region
func getS3Client(ctx context.Context, region string) (*s3.Client, error) {
cfg, err := awsconfig.LoadDefaultConfig(ctx, awsconfig.WithRegion(region))
if err != nil {
return nil, fmt.Errorf("unable to load AWS SDK config: %w", err)
}
return s3.NewFromConfig(cfg), nil
}
Now for the main Reconcile function. This acts as a dispatcher based on the deletionTimestamp.
//+kubebuilder:rbac:groups=cache.my.domain,resources=s3buckets,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups=cache.my.domain,resources=s3buckets/status,verbs=get;update;patch
//+kubebuilder:rbac:groups=cache.my.domain,resources=s3buckets/finalizers,verbs=update
func (r *S3BucketReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
logger := log.FromContext(ctx)
// 1. Fetch the S3Bucket instance
s3Bucket := &cachev1.S3Bucket{}
err := r.Get(ctx, req.NamespacedName, s3Bucket)
if err != nil {
if errors.IsNotFound(err) {
logger.Info("S3Bucket resource not found. Ignoring since object must be deleted")
return ctrl.Result{}, nil
}
logger.Error(err, "Failed to get S3Bucket")
return ctrl.Result{}, err
}
// 2. Check if the object is being deleted
isMarkedForDeletion := s3Bucket.GetDeletionTimestamp() != nil
if isMarkedForDeletion {
if controllerutil.ContainsFinalizer(s3Bucket, s3BucketFinalizer) {
// Run our finalizer logic
if err := r.handleFinalizer(ctx, s3Bucket); err != nil {
// Don't remove the finalizer if cleanup fails, so we can retry.
logger.Error(err, "Failed to handle finalizer")
return ctrl.Result{}, err
}
// Cleanup was successful, remove the finalizer
controllerutil.RemoveFinalizer(s3Bucket, s3BucketFinalizer)
err := r.Update(ctx, s3Bucket)
if err != nil {
return ctrl.Result{}, err
}
}
// Stop reconciliation as the item is being deleted
return ctrl.Result{}, nil
}
// 3. Add finalizer for a new object if it doesn't exist
if !controllerutil.ContainsFinalizer(s3Bucket, s3BucketFinalizer) {
logger.Info("Adding finalizer for the S3Bucket")
controllerutil.AddFinalizer(s3Bucket, s3BucketFinalizer)
err = r.Update(ctx, s3Bucket)
if err != nil {
return ctrl.Result{}, err
}
// Requeue after adding the finalizer to ensure the next reconciliation sees it
return ctrl.Result{Requeue: true}, nil
}
// 4. Run the main reconciliation logic for creating/updating the bucket
return r.handleReconciliation(ctx, s3Bucket)
}
The handleFinalizer function contains the core cleanup logic.
func (r *S3BucketReconciler) handleFinalizer(ctx context.Context, s3Bucket *cachev1.S3Bucket) error {
logger := log.FromContext(ctx)
s3Client, err := getS3Client(ctx, s3Bucket.Spec.Region)
if err != nil {
return fmt.Errorf("failed to create s3 client: %w", err)
}
bucketName := s3Bucket.Spec.BucketName
logger.Info("Attempting to delete S3 bucket", "BucketName", bucketName)
// Idempotency check: Check if bucket exists before trying to delete
_, err = s3Client.HeadBucket(ctx, &s3.HeadBucketInput{
Bucket: aws.String(bucketName),
})
if err != nil {
var nsb *types.NotFound
if errors.As(err, &nsb) {
// This is an important edge case. If the bucket is already gone
// (e.g., manually deleted, or a previous reconciliation attempt succeeded
// but the finalizer removal failed), we consider the cleanup successful.
logger.Info("S3 bucket does not exist. Finalizer can be removed.", "BucketName", bucketName)
return nil
}
return fmt.Errorf("failed to check bucket existence: %w", err)
}
// The bucket exists, so we proceed with deletion.
// NOTE: A production operator should handle non-empty buckets.
// For this example, we assume buckets are empty.
_, err = s3Client.DeleteBucket(ctx, &s3.DeleteBucketInput{
Bucket: aws.String(bucketName),
})
if err != nil {
return fmt.Errorf("failed to delete s3 bucket %s: %w", bucketName, err)
}
logger.Info("Successfully deleted S3 bucket", "BucketName", bucketName)
return nil
}
Finally, the handleReconciliation function manages the Create/Update logic.
func (r *S3BucketReconciler) handleReconciliation(ctx context.Context, s3Bucket *cachev1.S3Bucket) (ctrl.Result, error) {
logger := log.FromContext(ctx)
s3Client, err := getS3Client(ctx, s3Bucket.Spec.Region)
if err != nil {
// Update status and requeue
s3Bucket.Status.State = "Error"
s3Bucket.Status.Message = "Failed to create AWS client: " + err.Error()
if updateErr := r.Status().Update(ctx, s3Bucket); updateErr != nil {
logger.Error(updateErr, "Failed to update S3Bucket status")
}
return ctrl.Result{}, err
}
bucketName := s3Bucket.Spec.BucketName
// Check if the bucket already exists
_, err = s3Client.HeadBucket(ctx, &s3.HeadBucketInput{
Bucket: aws.String(bucketName),
})
if err != nil {
var nsb *types.NotFound
if errors.As(err, &nsb) {
// Bucket does not exist, create it
logger.Info("S3 bucket not found, creating it.", "BucketName", bucketName)
_, createErr := s3Client.CreateBucket(ctx, &s3.CreateBucketInput{
Bucket: aws.String(bucketName),
CreateBucketConfiguration: &types.CreateBucketConfiguration{
LocationConstraint: types.BucketLocationConstraint(s3Bucket.Spec.Region),
},
})
if createErr != nil {
logger.Error(createErr, "Failed to create S3 bucket")
s3Bucket.Status.State = "Error"
s3Bucket.Status.Message = "Failed to create S3 bucket: " + createErr.Error()
if updateErr := r.Status().Update(ctx, s3Bucket); updateErr != nil {
logger.Error(updateErr, "Failed to update S3Bucket status")
}
return ctrl.Result{}, createErr
}
logger.Info("Successfully created S3 bucket", "BucketName", bucketName)
} else {
// Another error occurred
logger.Error(err, "Error checking for S3 bucket existence")
return ctrl.Result{}, err
}
}
// Bucket exists, update status
s3Bucket.Status.State = "Created"
s3Bucket.Status.URL = fmt.Sprintf("https://%s.s3.%s.amazonaws.com", bucketName, s3Bucket.Spec.Region)
s3Bucket.Status.Message = "Bucket is healthy and available."
if err := r.Status().Update(ctx, s3Bucket); err != nil {
logger.Error(err, "Failed to update S3Bucket status")
return ctrl.Result{}, err
}
return ctrl.Result{}, nil
}
func (r *S3BucketReconciler) SetupWithManager(mgr ctrl.Manager) error {
return ctrl.NewControllerManagedBy(mgr).
For(&cachev1.S3Bucket{}).
Complete(r)
}
Advanced Edge Cases and Production Hardening
The code above provides a solid foundation, but production environments surface tricky edge cases. A senior engineer must anticipate and handle these.
1. The "Terminating" Zombie: Hung Deletions
Problem: What happens if the handleFinalizer function fails consistently? For instance, the operator's IAM role lacks s3:DeleteBucket permissions, or a bucket policy prevents deletion. The external resource will never be deleted, the finalizer will never be removed, and the CR will be stuck in the Terminating state forever.
Solutions:
* Robust Status Updates: During the deletion phase, update the CR's status with detailed error messages. This is crucial for observability.
// Inside the Reconcile function's deletion block
if err := r.handleFinalizer(ctx, s3Bucket); err != nil {
logger.Error(err, "Finalizer cleanup failed, will retry")
// Update status to reflect the deletion error
s3Bucket.Status.State = "DeletionFailed"
s3Bucket.Status.Message = err.Error()
if updateErr := r.Status().Update(ctx, s3Bucket); updateErr != nil {
logger.Error(updateErr, "Failed to update status for deletion failure")
// Return both errors if necessary, or prioritize one
}
return ctrl.Result{}, err // Requeue with error
}
An administrator can then run kubectl describe s3bucket my-stuck-bucket and immediately see the cause (e.g., AccessDenied).
* Exponential Backoff: The default controller-runtime requeue mechanism already implements exponential backoff, which is essential. Failed reconciliations will be retried with increasing delays, preventing a tight loop of API calls to a failing external service.
* Manual Intervention: In an unrecoverable situation (e.g., the external resource was deleted by a rogue script, but the API call to verify this is also failing), an administrator might need to intervene. The only way to unstick the CR is to manually remove the finalizer:
kubectl patch s3bucket my-stuck-bucket --type json --patch='[ { "op": "remove", "path": "/metadata/finalizers" } ]'
This is a dangerous operation that should be a last resort, as it can lead to orphaned resources if the cleanup logic was actually still required.
2. Idempotency in Failure Scenarios
Problem: Consider this sequence:
handleFinalizer successfully calls the AWS DeleteBucket API. controllerutil.RemoveFinalizer and r.Update(ctx, s3Bucket). When the operator restarts, it will trigger reconciliation for the my-test-bucket CR again. The deletionTimestamp is still set, so it will call handleFinalizer a second time. If the code is not idempotent, this could cause problems.
Solution: The cleanup logic must gracefully handle cases where the resource is already gone. Our implementation already does this:
// Inside handleFinalizer
_, err = s3Client.HeadBucket(ctx, &s3.HeadBucketInput{...})
if err != nil {
var nsb *types.NotFound
if errors.As(err, &nsb) {
// The bucket is already gone. This is a success condition for cleanup.
logger.Info("S3 bucket does not exist. Finalizer can be removed.")
return nil
}
return fmt.Errorf("failed to check bucket existence: %w", err)
}
This HeadBucket check is the key to idempotency. It ensures that calling the finalizer logic multiple times on a deleted resource results in a success state, allowing the finalizer to be removed and the process to complete.
3. Coordinating Multiple Controllers with Multiple Finalizers
Problem: An S3Bucket CR might be managed by more than one controller. Our s3-operator manages its lifecycle. A separate backup-operator might need to ensure a final backup is taken before the bucket is deleted. Both need to perform pre-delete actions.
Solution: Both operators can add their own unique finalizers to the CR.
* s3-operator adds s3.my-company.com/finalizer
* backup-operator adds backup.my-company.com/finalizer
When kubectl delete is called, Kubernetes will set the deletionTimestamp but will not garbage collect the object until both finalizers are removed. Each operator's reconciliation loop is responsible only for checking for and removing its own finalizer after its specific cleanup task is complete. This provides a powerful, loosely-coupled mechanism for coordinating complex, multi-step cleanup procedures across different domains.
Performance and Scalability
* API Server Load: Every CR creation and deletion now involves an extra UPDATE API call to add/remove the finalizer. For an operator managing tens of thousands of CRs with high churn, this can add significant load to the Kubernetes API server. There is no simple way around this, as it's fundamental to the pattern. It's a trade-off for correctness.
* Concurrent Reconciles: If a thousand CRs are deleted simultaneously, the operator might try to launch a thousand concurrent cleanup jobs, potentially overwhelming the external API (e.g., AWS S3 API rate limits). You can control this by setting MaxConcurrentReconciles in your main.go.
// main.go
...
err = builder.Build().Start(ctx)
if err != nil {
mgr, err := ctrl.NewManager(ctrl.GetConfigOrDie(), ctrl.Options{
Scheme: scheme,
// ... other options
})
// ...
if err = (&controllers.S3BucketReconciler{
Client: mgr.GetClient(),
Scheme: mgr.GetScheme(),
}).SetupWithManager(mgr, controller.Options{MaxConcurrentReconciles: 10}); err != nil {
// handle error
}
}
Setting this to a reasonable number (e.g., 10) creates a worker pool, ensuring that no more than 10 Reconcile loops run concurrently, effectively throttling requests to the external system.
Conclusion
Finalizers are the cornerstone of any robust Kubernetes operator that manages stateful, external resources. They elevate an operator from a simple automation tool to a true lifecycle manager that respects external dependencies. By meticulously handling the deletionTimestamp, ensuring cleanup logic is idempotent, providing clear status updates for failure states, and managing concurrency, you can build operators that are resilient, observable, and safe to run in large-scale production environments. The patterns discussed here are not just best practices; they are the minimum requirement for bridging the declarative Kubernetes world with the imperative reality of external systems.