Idempotent K8s Operators with Finalizers for Stateful Services
The Lifecycle Management Gap with External Resources
As senior engineers, we've embraced Kubernetes for its powerful declarative APIs and robust reconciliation loops for managing in-cluster resources. The Operator pattern extends this power, allowing us to manage anything—from application deployments to off-cluster infrastructure like cloud databases, message queues, or DNS entries. However, this extension introduces a critical challenge that the default Kubernetes garbage collection doesn't solve: lifecycle and ownership of external resources.
When a user executes kubectl delete my-custom-resource, the Kubernetes API server dutifully removes the object from etcd. But what about the AWS RDS instance or the Google Cloud SQL database that this object represented? Kubernetes has no intrinsic knowledge of it. The external resource is now an orphan—a costly, unmanaged, and potentially insecure piece of infrastructure.
This is the core problem that finalizers solve. A finalizer is a mechanism that allows controllers to hook into the pre-deletion lifecycle of an object. It's a key in an object's metadata.finalizers list that tells the API server, "Do not fully delete this object yet. A controller is still performing cleanup tasks." The object is instead put into a terminating state, signified by the presence of a metadata.deletionTimestamp.
This article provides a deep, implementation-focused guide on building a production-grade, idempotent Kubernetes operator in Go that uses finalizers to manage the complete lifecycle of a stateful external resource. We will move beyond the basics and focus on the patterns required for robust, real-world systems, including handling race conditions, error states, and performance considerations.
Prerequisites
This guide assumes you are a senior developer with:
* Solid experience with Go.
* A strong understanding of Kubernetes architecture, including controllers, CRDs, and the control loop.
* Familiarity with an operator framework like Kubebuilder or Operator SDK (we will use Kubebuilder conventions).
Section 1: Anatomy of an Idempotent Reconciliation Loop
The heart of any operator is its Reconcile function. This function is the embodiment of the desired state vs. actual state reconciliation loop. For an operator managing external resources, this loop is more complex than one managing in-cluster Pods or Deployments.
Idempotency is not optional; it is a hard requirement. The Reconcile function can and will be called multiple times for the same object state due to various cluster events. Your logic must produce the same outcome whether it runs once or ten times in a row. This means every action must be predicated on a check of the current state.
Let's define a Custom Resource for managing a hypothetical external service, which we'll call a DatabaseCluster.
`api/v1/databasecluster_types.go`
package v1
import (
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
// DatabaseClusterSpec defines the desired state of DatabaseCluster
type DatabaseClusterSpec struct {
// Engine is the database engine (e.g., "postgres", "mysql")
Engine string `json:"engine"`
// Version is the engine version
Version string `json:"version"`
// Size represents the instance size (e.g., "small", "medium", "large")
Size string `json:"size"`
}
// DatabaseClusterStatus defines the observed state of DatabaseCluster
type DatabaseClusterStatus struct {
// Conditions represent the latest available observations of the DatabaseCluster's state.
// +optional
Conditions []metav1.Condition `json:"conditions,omitempty"`
// ExternalID is the unique identifier of the resource in the external system.
ExternalID string `json:"externalID,omitempty"`
// Endpoint is the connection endpoint for the database.
Endpoint string `json:"endpoint,omitempty"`
}
//+kubebuilder:object:root=true
//+kubebuilder:subresource:status
// DatabaseCluster is the Schema for the databaseclusters API
type DatabaseCluster struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:",inline"`
Spec DatabaseClusterSpec `json:"spec,omitempty"`
Status DatabaseClusterStatus `json:"status,omitempty"`
}
//+kubebuilder:object:root=true
// DatabaseClusterList contains a list of DatabaseCluster
type DatabaseClusterList struct {
metav1.TypeMeta `json:",inline"`
metav1.ListMeta `json:",inline"`
Items []DatabaseCluster `json:"items"`
}
func init() {
SchemeBuilder.Register(&DatabaseCluster{}, &DatabaseClusterList{})
}
A basic, non-finalizer reconciliation loop for this resource might look like this:
`controllers/databasecluster_controller.go` (Initial Idempotent Logic)
func (r *DatabaseClusterReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
log := log.FromContext(ctx)
// 1. Fetch the DatabaseCluster instance
var dbCluster v1.DatabaseCluster
if err := r.Get(ctx, req.NamespacedName, &dbCluster); err != nil {
if apierrors.IsNotFound(err) {
log.Info("DatabaseCluster resource not found. Ignoring since object must be deleted")
return ctrl.Result{}, nil
}
log.Error(err, "Failed to get DatabaseCluster")
return ctrl.Result{}, err
}
// This is where our core logic will go. For now, it's a placeholder.
// Let's simulate interaction with an external service client.
externalClient := r.ExternalServiceClient
// 2. Check if the external resource exists.
// We use an ID stored in the Status to track the external resource.
externalResource, err := externalClient.Get(dbCluster.Status.ExternalID)
if err != nil {
// Assuming a specific error type for 'not found'
if IsNotFound(err) {
log.Info("External resource not found. Creating a new one.")
newID, newEndpoint, createErr := externalClient.Create(dbCluster.Spec.Engine, dbCluster.Spec.Version, dbCluster.Spec.Size)
if createErr != nil {
log.Error(createErr, "Failed to create external resource")
// Update status with failure condition and requeue
return ctrl.Result{}, createErr
}
// IMPORTANT: Update the status immediately after creation
dbCluster.Status.ExternalID = newID
dbCluster.Status.Endpoint = newEndpoint
if updateErr := r.Status().Update(ctx, &dbCluster); updateErr != nil {
log.Error(updateErr, "Failed to update DatabaseCluster status after creation")
return ctrl.Result{}, updateErr
}
log.Info("Successfully created external resource and updated status", "ExternalID", newID)
return ctrl.Result{}, nil
}
// Handle other API errors
log.Error(err, "Failed to get external resource")
return ctrl.Result{}, err
}
// 3. The resource exists, ensure its state matches the spec (idempotent update)
if externalResource.Engine != dbCluster.Spec.Engine || externalResource.Version != dbCluster.Spec.Version {
log.Info("External resource state does not match spec. Updating.")
updateErr := externalClient.Update(dbCluster.Status.ExternalID, dbCluster.Spec.Engine, dbCluster.Spec.Version)
if updateErr != nil {
log.Error(updateErr, "Failed to update external resource")
return ctrl.Result{}, updateErr
}
}
log.Info("Reconciliation complete. External resource is in desired state.")
return ctrl.Result{}, nil
}
This logic handles creation and updates idempotently. If the reconciler crashes after creating the external resource but before updating the Status, the next reconciliation run will find the ExternalID is empty, try to create it again, and hopefully your externalClient.Create call is itself idempotent (e.g., using a unique name derived from the CR's UID). A better approach we'll discuss later is using tags to find and adopt orphaned resources.
But notice the glaring hole: there is no deletion logic. This is where we introduce finalizers.
Section 2: Implementing the Finalizer Pattern
The finalizer pattern fundamentally alters the structure of the Reconcile function. It creates two primary branches of logic: one for when the object is active, and one for when it is being deleted.
Here is the canonical lifecycle with a finalizer:
DatabaseCluster CR.Reconcile function is triggered. It sees the CR is new (no deletionTimestamp) and does not have its finalizer.databasecluster.my-domain.com/finalizer) to the metadata.finalizers list and updates the object. This is the first action it takes.kubectl delete databasecluster my-db.metadata.deletionTimestamp to the current time and updates the object.Reconcile function is triggered again. This time, it detects that deletionTimestamp is not nil.metadata.finalizers list and updates the object.deletionTimestamp and an empty finalizers list. It proceeds with the final deletion from etcd.Let's refactor our Reconcile function to incorporate this pattern.
`controllers/databasecluster_controller.go` (with Finalizer Logic)
import (
// ... other imports
"sigs.k8s.io/controller-runtime/pkg/controller/controllerutil"
)
// Define the finalizer name
const databaseClusterFinalizer = "databasecluster.my-domain.com/finalizer"
func (r *DatabaseClusterReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
log := log.FromContext(ctx)
var dbCluster v1.DatabaseCluster
if err := r.Get(ctx, req.NamespacedName, &dbCluster); err != nil {
if apierrors.IsNotFound(err) {
return ctrl.Result{}, nil
}
log.Error(err, "Failed to get DatabaseCluster")
return ctrl.Result{}, err
}
// Examine if the object is under deletion
isBeingDeleted := dbCluster.GetDeletionTimestamp() != nil
if isBeingDeleted {
// A. Deletion Logic Branch
if controllerutil.ContainsString(dbCluster.GetFinalizers(), databaseClusterFinalizer) {
// Our finalizer is present, so let's handle external resource cleanup.
if err := r.finalizeExternalResource(ctx, &dbCluster); err != nil {
// If finalize fails, we return an error to retry the reconciliation.
log.Error(err, "Failed to finalize external resource")
return ctrl.Result{}, err
}
// Cleanup was successful. Remove our finalizer from the list and update it.
log.Info("External resource finalized successfully. Removing finalizer.")
controllerutil.RemoveString(dbCluster.GetFinalizers(), databaseClusterFinalizer, &dbCluster.Finalizers)
if err := r.Update(ctx, &dbCluster); err != nil {
log.Error(err, "Failed to remove finalizer")
return ctrl.Result{}, err
}
}
// Stop reconciliation as the item is being deleted
return ctrl.Result{}, nil
} else {
// B. Normal Reconciliation Branch
// Add finalizer for this CR if it doesn't exist yet
if !controllerutil.ContainsString(dbCluster.GetFinalizers(), databaseClusterFinalizer) {
log.Info("Adding finalizer for the DatabaseCluster")
controllerutil.AddFinalizer(&dbCluster, databaseClusterFinalizer)
if err := r.Update(ctx, &dbCluster); err != nil {
log.Error(err, "Failed to add finalizer")
return ctrl.Result{}, err
}
}
}
// Now, proceed with the create/update logic from Section 1.
// This logic remains largely the same.
externalClient := r.ExternalServiceClient
externalResource, err := externalClient.Get(dbCluster.Status.ExternalID)
// ... (rest of the create/update logic as before)
return ctrl.Result{}, nil
}
// finalizeExternalResource performs the cleanup logic for the external resource.
func (r *DatabaseClusterReconciler) finalizeExternalResource(ctx context.Context, dbCluster *v1.DatabaseCluster) error {
log := log.FromContext(ctx)
externalClient := r.ExternalServiceClient
log.Info("Starting finalization of external resource", "ExternalID", dbCluster.Status.ExternalID)
// It's important to check if there is an external ID. If not, there's nothing to clean up.
if dbCluster.Status.ExternalID == "" {
log.Info("ExternalID is empty, no external resource to finalize.")
return nil
}
if err := externalClient.Delete(dbCluster.Status.ExternalID); err != nil {
// If the external resource is already gone, we can consider it a success.
if IsNotFound(err) {
log.Info("External resource already deleted.")
return nil
}
log.Error(err, "Failed to delete external resource during finalization")
return err
}
log.Info("Successfully deleted external resource", "ExternalID", dbCluster.Status.ExternalID)
return nil
}
This structure is the foundation of a robust operator. The Reconcile function now correctly handles the full object lifecycle. The use of controller-runtime/pkg/controller/controllerutil helpers simplifies finalizer manipulation.
Section 3: Advanced Edge Cases and Production Hardening
Having a basic finalizer loop is good, but production systems are messy. Here are critical edge cases you will encounter and how to design your operator to handle them.
Edge Case 1: The Stuck Finalizer
Problem: What happens if finalizeExternalResource fails consistently? For example, the external API is down, or the credentials used by the operator have expired. The deletionTimestamp is set, the finalizer is present, but the cleanup logic can never complete. The DatabaseCluster object will be stuck in the Terminating state forever, preventing namespace deletion and causing operator log spam.
Solution:
finalizeExternalResource function should not just return a generic error. It should inspect the error and react accordingly. A transient network error should trigger a requeue with backoff (return ctrl.Result{RequeueAfter: time.Minute}, nil). A permanent error (e.g., 403 Forbidden) should be logged as critical and reflected in the CR's Status.Conditions. // In DatabaseClusterStatus
// Conditions represent the latest available observations of the DatabaseCluster's state.
// +optional
Conditions []metav1.Condition `json:"conditions,omitempty"`
// In the reconciler, after a failed finalization
meta.SetStatusCondition(&dbCluster.Status.Conditions, metav1.Condition{
Type: "Degraded",
Status: metav1.ConditionTrue,
Reason: "FinalizationFailed",
Message: fmt.Sprintf("Failed to delete external resource: %v", err),
})
if updateErr := r.Status().Update(ctx, &dbCluster); updateErr != nil {
// ... log error
}
return ctrl.Result{}, err // Requeue with exponential backoff
// In the deletion logic branch
if annotation, ok := dbCluster.Annotations["my-domain.com/force-delete"]; ok && annotation == "true" {
log.Info("Force-delete annotation detected. Skipping finalization.")
// ... remove finalizer and return
}
// ... proceed with normal finalization
This is a powerful but dangerous tool that should be documented and used with extreme caution as it will orphan the external resource.
Edge Case 2: Orphaned External Resources
Problem: The operator creates the external resource, but crashes or gets restarted before it can write the ExternalID to the DatabaseCluster's status. On the next reconciliation, Status.ExternalID is empty, so the operator assumes no external resource exists and creates a new one. You now have a duplicate, orphaned resource.
Solution: Tagging and Adoption Logic
Never rely solely on the CR's Status as the source of truth for resource existence. The external system is the ultimate source of truth. Use a tagging/labeling mechanism on the external resource that links it back to its Kubernetes owner.
* Tagging: When creating the external resource, add tags like:
* kubernetes.io/cluster-name: my-prod-cluster
* managed-by: databasecluster-operator
* owner-namespace: default
* owner-name: my-db
* owner-uid: (The UID is immutable and essential for preventing conflicts if a CR is deleted and recreated with the same name).
* Adoption Logic: In the reconciliation loop, if Status.ExternalID is empty, don't immediately create a new resource. Instead, query the external system for a resource with tags matching the current CR's UID.
// In the main reconciliation logic (create/update path)
if dbCluster.Status.ExternalID == "" {
log.Info("ExternalID is not set. Attempting to find and adopt an existing resource.")
foundResource, err := externalClient.FindByUID(string(dbCluster.GetUID()))
if err != nil {
// Handle API errors
return ctrl.Result{}, err
}
if foundResource != nil {
log.Info("Found existing external resource to adopt", "ExternalID", foundResource.ID)
dbCluster.Status.ExternalID = foundResource.ID
dbCluster.Status.Endpoint = foundResource.Endpoint
if err := r.Status().Update(ctx, &dbCluster); err != nil {
return ctrl.Result{}, err
}
// Continue to the update/sync logic
} else {
// No existing resource found, now it's safe to create one.
// ... creation logic here ...
}
}
This makes your operator resilient to crashes and ensures a one-to-one mapping between the CR and the external resource.
Edge Case 3: API Rate Limiting and Controller Performance
Problem: A buggy controller, a flapping CR, or a large number of resources can lead to a high volume of reconciliation loops. This can overwhelm the external service's API, leading to rate limiting, which in turn causes more reconciliation failures and a vicious cycle.
Solution:
err on every failure. controller-runtime implements exponential backoff by default when you return an error, which is good. But for known rate-limiting errors, you can be more explicit by returning a ctrl.Result with a longer RequeueAfter duration.golang.org/x/time/rate package is excellent for this. // When initializing your reconciler
limiter := rate.NewLimiter(rate.Limit(10), 1) // 10 requests per second, burst of 1
reconciler.ExternalServiceClient = &MyClient{
HTTPClient: http.DefaultClient,
Limiter: limiter,
}
// In your client methods
func (c *MyClient) Get(id string) error {
if err := c.Limiter.Wait(context.Background()); err != nil {
return err
}
// ... perform API call
}
Status or Spec if nothing has changed. Use a deep equality check on the status before patching/updating. // Before updating status
originalStatus := dbCluster.Status.DeepCopy()
// ... modify dbCluster.Status ...
if !reflect.DeepEqual(originalStatus, &dbCluster.Status) {
if err := r.Status().Update(ctx, &dbCluster); err != nil {
// ... handle error
}
}
Section 4: Complete Code Example Walkthrough
Let's put all these patterns together into a more complete databasecluster_controller.go.
package controllers
import (
"context"
"fmt"
"time"
"k8s.io/apimachinery/pkg/api/errors"
apierrors "k8s.io/apimachinery/pkg/api/errors"
"k8s.io/apimachinery/pkg/api/meta"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/runtime"
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/client"
"sigs.k8s.io/controller-runtime/pkg/controller/controllerutil"
"sigs.k8s.io/controller-runtime/pkg/log"
v1 "github.com/my-org/database-operator/api/v1"
)
const databaseClusterFinalizer = "databasecluster.my-domain.com/finalizer"
// DatabaseClusterReconciler reconciles a DatabaseCluster object
type DatabaseClusterReconciler struct {
client.Client
Scheme *runtime.Scheme
ExternalServiceClient *ExternalClient // This is a mock client for our external DB service
}
//+kubebuilder:rbac:groups=my-domain.com,resources=databaseclusters,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups=my-domain.com,resources=databaseclusters/status,verbs=get;update;patch
//+kubebuilder:rbac:groups=my-domain.com,resources=databaseclusters/finalizers,verbs=update
func (r *DatabaseClusterReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
log := log.FromContext(ctx)
var dbCluster v1.DatabaseCluster
if err := r.Get(ctx, req.NamespacedName, &dbCluster); err != nil {
if apierrors.IsNotFound(err) {
log.Info("Resource not found. Ignoring.")
return ctrl.Result{}, nil
}
log.Error(err, "Failed to get DatabaseCluster")
return ctrl.Result{}, err
}
// Initialize status conditions
if dbCluster.Status.Conditions == nil {
dbCluster.Status.Conditions = []metav1.Condition{}
}
// Handle deletion
if dbCluster.GetDeletionTimestamp() != nil {
if controllerutil.ContainsString(dbCluster.GetFinalizers(), databaseClusterFinalizer) {
log.Info("Performing Finalizer Operations for DatabaseCluster before deletion")
meta.SetStatusCondition(&dbCluster.Status.Conditions, metav1.Condition{
Type: "Terminating",
Status: metav1.ConditionTrue,
Reason: "Finalizing",
Message: "Performing finalizer operations",
})
if err := r.Status().Update(ctx, &dbCluster); err != nil {
return ctrl.Result{}, err
}
if err := r.finalizeExternalResource(ctx, &dbCluster); err != nil {
log.Error(err, "Finalization failed")
meta.SetStatusCondition(&dbCluster.Status.Conditions, metav1.Condition{
Type: "Terminating",
Status: metav1.ConditionFalse,
Reason: "FinalizationError",
Message: fmt.Sprintf("Finalization failed: %v", err),
})
_ = r.Status().Update(ctx, &dbCluster) // Best-effort status update
return ctrl.Result{RequeueAfter: 30 * time.Second}, err // Requeue with delay
}
log.Info("Finalizer operations complete. Removing finalizer.")
controllerutil.RemoveString(dbCluster.GetFinalizers(), databaseClusterFinalizer, &dbCluster.Finalizers)
if err := r.Update(ctx, &dbCluster); err != nil {
return ctrl.Result{}, err
}
}
return ctrl.Result{}, nil
}
// Add finalizer if it doesn't exist
if !controllerutil.ContainsString(dbCluster.GetFinalizers(), databaseClusterFinalizer) {
log.Info("Adding Finalizer for the DatabaseCluster")
controllerutil.AddFinalizer(&dbCluster, databaseClusterFinalizer)
if err := r.Update(ctx, &dbCluster); err != nil {
return ctrl.Result{}, err
}
}
// --- Main Reconciliation Logic ---
// Adopt or Create
externalResource, err := r.ExternalServiceClient.FindByUID(string(dbCluster.GetUID()))
if err != nil {
log.Error(err, "Failed to query for external resource by UID")
meta.SetStatusCondition(&dbCluster.Status.Conditions, metav1.Condition{Type: "Available", Status: metav1.ConditionFalse, Reason: "Reconciling", Message: "Failed to query external system"})
_ = r.Status().Update(ctx, &dbCluster)
return ctrl.Result{}, err
}
if externalResource == nil {
log.Info("External resource not found, creating new one.")
newResource, err := r.ExternalServiceClient.Create(dbCluster.Spec, string(dbCluster.GetUID()))
if err != nil {
log.Error(err, "Failed to create external resource")
meta.SetStatusCondition(&dbCluster.Status.Conditions, metav1.Condition{Type: "Available", Status: metav1.ConditionFalse, Reason: "CreationFailed", Message: err.Error()})
_ = r.Status().Update(ctx, &dbCluster)
return ctrl.Result{}, err
}
externalResource = newResource
} else {
log.Info("Found existing external resource", "ID", externalResource.ID)
}
// Sync state
if !isResourceInSync(dbCluster.Spec, externalResource) {
log.Info("External resource out of sync, updating.")
if err := r.ExternalServiceClient.Update(externalResource.ID, dbCluster.Spec); err != nil {
log.Error(err, "Failed to update external resource")
meta.SetStatusCondition(&dbCluster.Status.Conditions, metav1.Condition{Type: "Available", Status: metav1.ConditionFalse, Reason: "UpdateFailed", Message: err.Error()})
_ = r.Status().Update(ctx, &dbCluster)
return ctrl.Result{}, err
}
}
// Update Status
newStatus := v1.DatabaseClusterStatus{
ExternalID: externalResource.ID,
Endpoint: externalResource.Endpoint,
Conditions: dbCluster.Status.Conditions,
}
meta.SetStatusCondition(&newStatus.Conditions, metav1.Condition{Type: "Available", Status: metav1.ConditionTrue, Reason: "Reconciled", Message: "External resource is in sync"})
// Avoid unnecessary status updates
if !areStatusEqual(dbCluster.Status, newStatus) {
dbCluster.Status = newStatus
if err := r.Status().Update(ctx, &dbCluster); err != nil {
log.Error(err, "Failed to update status")
return ctrl.Result{}, err
}
}
log.Info("Reconciliation successful")
return ctrl.Result{}, nil
}
// ... (finalizeExternalResource, helper functions like isResourceInSync, areStatusEqual) ...
// SetupWithManager sets up the controller with the Manager.
func (r *DatabaseClusterReconciler) SetupWithManager(mgr ctrl.Manager) error {
return ctrl.NewControllerManagedBy(mgr).
For(&v1.DatabaseCluster{}).
Complete(r)
}
Example CR YAML
apiVersion: my-domain.com/v1
kind: DatabaseCluster
metadata:
name: my-postgres-db
namespace: default
spec:
engine: "postgres"
version: "14.2"
size: "medium"
To test the flow:
kubectl apply -f config/samples/my-domain_v1_databasecluster.yamlkubectl get databasecluster my-postgres-db -o yaml. Observe the finalizers array and the updated status.kubectl delete databasecluster my-postgres-db.kubectl get databasecluster my-postgres-db -o yaml. You will see the deletionTimestamp is set and the object remains while the operator's logs show finalization is in progress. Once cleanup is done and the finalizer is removed, the object will disappear.Conclusion
The finalizer pattern is not just a feature; it's the cornerstone of writing reliable Kubernetes operators that manage resources outside the cluster's direct control. By meticulously implementing an idempotent reconciliation loop that correctly handles both the creation/update path and the deletion path, you can build controllers that are resilient to failure and prevent the costly anti-pattern of orphaned infrastructure.
Remember the key takeaways for a production-grade operator:
* Idempotency is paramount: Every reconciliation must be repeatable without side effects.
* Finalizers control the deletion lifecycle: Use them to execute cleanup logic before Kubernetes deletes the CR.
* Plan for failure: Stuck finalizers are a real problem. Implement robust error handling, status conditions, and manual overrides.
* Tag and adopt: Never trust the CR status as the sole source of truth. Use external tagging to find and adopt orphaned resources.
* Be a good API citizen: Use rate limiting and intelligent requeues to avoid overwhelming external systems.
By building on these advanced patterns, you can confidently extend the power of the Kubernetes control plane to manage any stateful service, creating truly automated and reliable systems.