Building a Custom Kubernetes Operator in Go for Stateful App Management

12 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

From Consumer to Creator: Mastering Stateful Application Management with Custom Kubernetes Operators

For senior engineers, the limitations of standard Kubernetes resources like StatefulSets and Deployments become apparent when managing truly complex, domain-specific applications. A standard StatefulSet can manage a replicated database, but it can't orchestrate a custom backup strategy, handle a complex failover sequence, or manage database schema migrations. This is where the Operator Pattern transforms our relationship with Kubernetes, elevating us from platform consumers to platform builders.

An Operator encodes human operational knowledge into software that extends the Kubernetes API. It combines a Custom Resource Definition (CRD), which defines our new API object (e.g., ShardedDatabase), with a custom controller that acts on those objects. This post is not an introduction. We assume you understand the Operator Pattern's purpose. Instead, we will dive directly into the critical implementation details that separate a proof-of-concept operator from a production-ready one.

We will build an operator for a hypothetical ShardedDatabase custom resource. Our goal is to manage its lifecycle with a level of sophistication impossible with off-the-shelf components. We will focus on three advanced, production-critical areas:

  • The Idempotent Reconciliation Loop: Architecting the core logic to be robust, error-proof, and capable of converging to the desired state from any starting point.
  • Graceful Deletion with Finalizers: Implementing a pre-delete hook to perform critical cleanup tasks, like taking a final database backup, before Kubernetes garbage collects the resource.
  • Advanced Status Management: Using the .status subresource not just for visibility, but as a crucial part of the controller's state machine, while handling potential race conditions and API conflicts.

  • Defining Our Custom API: The `ShardedDatabase` CRD

    First, let's define the API for our ShardedDatabase in Go. We'll use kubebuilder markers to generate the CRD manifest. The spec defines the desired state, and the status reflects the observed state of the world.

    go
    // api/v1alpha1/shardeddatabase_types.go
    
    package v1alpha1
    
    import (
    	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
    )
    
    // ShardedDatabaseSpec defines the desired state of ShardedDatabase
    type ShardedDatabaseSpec struct {
    	// Number of database shards to deploy.
    	// +kubebuilder:validation:Minimum=1
    	// +kubebuilder:validation:Maximum=10
    	Shards int32 `json:"shards"`
    
    	// The container image version for the database shards.
    	Version string `json:"version"`
    
    	// Cron-formatted schedule for periodic backups.
    	BackupSchedule string `json:"backupSchedule,omitempty"`
    }
    
    // ShardedDatabaseStatus defines the observed state of ShardedDatabase
    type ShardedDatabaseStatus struct {
    	// The current phase of the database cluster.
    	// +kubebuilder:validation:Enum=Creating;Ready;Updating;Failed
    	Phase string `json:"phase,omitempty"`
    
    	// Number of shards that are currently ready and accepting traffic.
    	ReadyShards int32 `json:"readyShards,omitempty"`
    
    	// The timestamp of the last successful backup.
    	LastBackupTime *metav1.Time `json:"lastBackupTime,omitempty"`
    
    	// Conditions represent the latest available observations of the ShardedDatabase's state.
    	// +optional
    	Conditions []metav1.Condition `json:"conditions,omitempty"`
    }
    
    //+kubebuilder:object:root=true
    //+kubebuilder:subresource:status
    //+kubebuilder:printcolumn:name="Shards",type="integer",JSONPath=".spec.shards"
    //+kubebuilder:printcolumn:name="Ready",type="integer",JSONPath=".status.readyShards"
    //+kubebuilder:printcolumn:name="Phase",type="string",JSONPath=".status.phase"
    //+kubebuilder:printcolumn:name="Age",type="date",JSONPath=".metadata.creationTimestamp"
    
    // ShardedDatabase is the Schema for the shardeddatabases API
    type ShardedDatabase struct {
    	metav1.TypeMeta   `json:",inline"`
    	metav1.ObjectMeta `json:"metadata,omitempty"`
    
    	Spec   ShardedDatabaseSpec   `json:"spec,omitempty"`
    	Status ShardedDatabaseStatus `json:"status,omitempty"`
    }
    
    //+kubebuilder:object:root=true
    
    // ShardedDatabaseList contains a list of ShardedDatabase
    type ShardedDatabaseList struct {
    	metav1.TypeMeta `json:",inline"`
    	metav1.ListMeta `json:"metadata,omitempty"`
    	Items           []ShardedDatabase `json:"items"`
    }
    
    func init() {
    	SchemeBuilder.Register(&ShardedDatabase{}, &ShardedDatabaseList{})
    }

    Key details here are the +kubebuilder markers. +kubebuilder:subresource:status is critical. It tells the API server to treat .status as a separate endpoint, preventing users from modifying it and enabling more granular RBAC. This is a non-negotiable for production operators.

    The Heart of the Operator: The Reconciliation Loop

    The Reconcile function is the core of our controller. It's invoked whenever a ShardedDatabase resource is created, updated, or deleted, or when a resource it owns (like a StatefulSet) changes.

    The most important principle is idempotency. The loop must be designed to run multiple times and always converge on the correct state. It's a level-triggered, not edge-triggered, system.

    Here's the skeleton of our Reconcile method in controllers/shardeddatabase_controller.go:

    go
    // controllers/shardeddatabase_controller.go
    
    import (
    	// ... other imports
    	ctrl "sigs.k8s.io/controller-runtime"
    	"sigs.k8s.io/controller-runtime/pkg/client"
    	"sigs.k8s.io/controller-runtime/pkg/log"
    
    	dbv1alpha1 "github.com/your-repo/shardeddb-operator/api/v1alpha1"
    )
    
    // ShardedDatabaseReconciler reconciles a ShardedDatabase object
    type ShardedDatabaseReconciler struct {
    	client.Client
    	Scheme *runtime.Scheme
    }
    
    //+kubebuilder:rbac:groups=db.example.com,resources=shardeddatabases,verbs=get;list;watch;create;update;patch;delete
    //+kubebuilder:rbac:groups=db.example.com,resources=shardeddatabases/status,verbs=get;update;patch
    //+kubebuilder:rbac:groups=db.example.com,resources=shardeddatabases/finalizers,verbs=update
    //+kubebuilder:rbac:groups=apps,resources=statefulsets,verbs=get;list;watch;create;update;patch;delete
    //+kubebuilder:rbac:groups=core,resources=configmaps,verbs=get;list;watch;create;update;patch;delete
    
    func (r *ShardedDatabaseReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    	logger := log.FromContext(ctx)
    
    	// 1. Fetch the ShardedDatabase instance
    	var db ShardedDatabase
    	if err := r.Get(ctx, req.NamespacedName, &db); err != nil {
    		if errors.IsNotFound(err) {
    			// Object was deleted. It's likely the finalizer logic is complete.
    			logger.Info("ShardedDatabase resource not found. Ignoring since object must be deleted")
    			return ctrl.Result{}, nil
    		}
    		// Error reading the object - requeue the request.
    		logger.Error(err, "Failed to get ShardedDatabase")
    		return ctrl.Result{}, err
    	}
    
    	// ... Our main reconciliation logic will go here ...
    
    	return ctrl.Result{}, nil
    }

    Requeue Strategy: `ctrl.Result{}` and Error Handling

    Understanding what to return from Reconcile is paramount:

    * return ctrl.Result{}, nil: Reconciliation was successful. The controller will not requeue unless an external watch event occurs.

    * return ctrl.Result{}, err: An error occurred. The controller-runtime will requeue the request with exponential backoff. Use this for transient, retryable errors (e.g., API server is temporarily unavailable).

    * return ctrl.Result{Requeue: true}, nil: Reconciliation was not complete but did not error. Requeue immediately. Use this sparingly, as it can cause busy-loops. It's often better to use RequeueAfter.

    return ctrl.Result{RequeueAfter: time.Second 30}, nil: Reconciliation was not complete. Requeue after a specific duration. This is perfect for periodic checks that don't need to be immediate.

    Production Pattern 1: Finalizers for Graceful Deletion

    What happens when a user runs kubectl delete shardeddatabase my-db? By default, Kubernetes deletes the object, and its owned resources (via OwnerReferences) are garbage collected. But what if we need to take a final backup first? This is the job of a finalizer.

    A finalizer is a key in the object's metadata that tells the API server to wait for its removal before deleting the object. Our controller is responsible for removing it after performing cleanup tasks.

    Here's the implementation pattern:

    go
    // controllers/shardeddatabase_controller.go
    
    import (
        // ...
        "sigs.k8s.io/controller-runtime/pkg/controller/controllerutil"
    )
    
    const shardedDatabaseFinalizer = "db.example.com/finalizer"
    
    func (r *ShardedDatabaseReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    	logger := log.FromContext(ctx)
    	// ... (fetch instance as before)
    
    	// Check if the object is being deleted
    	if !db.ObjectMeta.DeletionTimestamp.IsZero() {
    		// The object is being deleted
    		if controllerutil.ContainsFinalizer(&db, shardedDatabaseFinalizer) {
    			// Our finalizer is present, so let's handle any external dependency
    			if err := r.finalizeShardedDatabase(ctx, &db); err != nil {
    				// If fail to delete the external dependency here, return with error
    				// so that it can be retried
    				logger.Error(err, "Failed to run finalizer logic")
    				return ctrl.Result{}, err
    			}
    
    			// Remove our finalizer from the list and update it.
    			controllerutil.RemoveFinalizer(&db, shardedDatabaseFinalizer)
    			if err := r.Update(ctx, &db); err != nil {
    				return ctrl.Result{}, err
    			}
    		}
    		// Stop reconciliation as the item is being deleted
    		return ctrl.Result{}, nil
    	}
    
    	// Add finalizer for this CR if it doesn't have one
    	if !controllerutil.ContainsFinalizer(&db, shardedDatabaseFinalizer) {
    		controllerutil.AddFinalizer(&db, shardedDatabaseFinalizer)
    		logger.Info("Adding finalizer to ShardedDatabase")
    		if err := r.Update(ctx, &db); err != nil {
    			return ctrl.Result{}, err
    		}
    	}
    
        // ... rest of the reconciliation logic for create/update
    
    	return ctrl.Result{}, nil
    }
    
    func (r *ShardedDatabaseReconciler) finalizeShardedDatabase(ctx context.Context, db *dbv1alpha1.ShardedDatabase) error {
    	// This is where you put your cleanup logic.
    	// For example, trigger a final backup job, drain connections, etc.
    	logger := log.FromContext(ctx)
    	logger.Info("Performing finalization tasks for ShardedDatabase", "name", db.Name)
    
    	// Example: Create a one-off Kubernetes Job to perform a backup.
    	// We would need to wait for this job to complete successfully.
    	// This logic can be complex: check if the job exists, if it failed, etc.
    	// For simplicity, we'll just log a message here.
    	logger.Info("Final backup logic would run here.")
    
    	// IMPORTANT: This finalizer logic must be idempotent.
    	// It might be called multiple times if the `r.Update` call to remove the finalizer fails.
    
    	return nil
    }

    Edge Case: What if the finalizeShardedDatabase function fails repeatedly? The object will be stuck in a Terminating state forever. This is a critical operational consideration. Your finalizer logic must be robust, perhaps with its own retry mechanism, or you must have monitoring in place to alert an operator to a stuck finalizer that requires manual intervention.

    Production Pattern 2: Reconciling Owned Resources

    Now for the main logic: ensuring our child resources (the StatefulSets for each shard) exist and match the spec.

    go
    // controllers/shardeddatabase_controller.go
    
    func (r *ShardedDatabaseReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
        // ... (fetcher and finalizer logic from above)
    
        // === Main Reconciliation Logic ===
    
    	// Reconcile StatefulSet for shards
    	for i := 0; i < int(db.Spec.Shards); i++ {
    		shardName := fmt.Sprintf("%s-shard-%d", db.Name, i)
    		sts := &appsv1.StatefulSet{}
    		err := r.Get(ctx, types.NamespacedName{Name: shardName, Namespace: db.Namespace}, sts)
    
    		if err != nil && errors.IsNotFound(err) {
    			// Define a new StatefulSet
    			logger.Info("Creating a new StatefulSet for shard", "StatefulSet.Namespace", db.Namespace, "StatefulSet.Name", shardName)
    			newSts := r.statefulSetForShard(&db, i)
    			if err := r.Create(ctx, newSts); err != nil {
    				logger.Error(err, "Failed to create new StatefulSet", "StatefulSet.Namespace", db.Namespace, "StatefulSet.Name", shardName)
    				return ctrl.Result{}, err
    			}
    			// StatefulSet created, requeue to check status later
    			return ctrl.Result{Requeue: true}, nil
    		} else if err != nil {
    			logger.Error(err, "Failed to get StatefulSet")
    			return ctrl.Result{}, err
    		}
    
    		// Ensure the StatefulSet spec is up to date (e.g., image version)
    		imageVersion := sts.Spec.Template.Spec.Containers[0].Image
    		desiredImage := fmt.Sprintf("postgres:%s", db.Spec.Version)
    
    		if imageVersion != desiredImage {
    			logger.Info("Updating StatefulSet image version", "StatefulSet.Name", shardName, "From", imageVersion, "To", desiredImage)
    			sts.Spec.Template.Spec.Containers[0].Image = desiredImage
    			if err := r.Update(ctx, sts); err != nil {
    				logger.Error(err, "Failed to update StatefulSet")
    				return ctrl.Result{}, err
    			}
    			// Spec updated, let's requeue to check on the rollout status
    			return ctrl.Result{RequeueAfter: time.Second * 5}, nil
    		}
    	}
    
        // ... (status update logic will go here)
    
        return ctrl.Result{}, nil
    }
    
    // statefulSetForShard returns a ShardedDatabase StatefulSet object
    func (r *ShardedDatabaseReconciler) statefulSetForShard(db *dbv1alpha1.ShardedDatabase, shardIndex int) *appsv1.StatefulSet {
    	// ... Logic to build a StatefulSet object ...
    	// IMPORTANT: Set the owner reference so Kubernetes garbage collects it.
    	ctrl.SetControllerReference(db, sts, r.Scheme)
    	return sts
    }

    The most important line in the helper function is ctrl.SetControllerReference(db, sts, r.Scheme). This establishes the parent-child relationship. Without it, deleting the ShardedDatabase CR would leave orphan StatefulSets behind.

    Production Pattern 3: Robust Status Subresource Management

    The .status field is how your operator communicates the state of the world back to users and other systems. It's also essential for the operator's own decision-making. Updating it correctly is a multi-step process.

    • Read the current state of all owned resources (e.g., pods, services).
    • Calculate the new desired status.
    • Compare it with the existing status on the CR.
  • If different, send a PATCH or UPDATE request to the API server's /status subresource endpoint.
  • Edge Case: Optimistic Locking and Conflicts

    What if another process (or another replica of your operator, if running HA) updates the status between your GET and UPDATE? Your UPDATE will fail with a conflict error. The correct pattern is to retry the entire reconciliation loop.

    Here is the code to reconcile the status:

    go
    // controllers/shardeddatabase_controller.go
    
    func (r *ShardedDatabaseReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
        // ... (fetcher, finalizer, and StatefulSet reconciliation logic)
    
        // === Status Reconciliation Logic ===
    
        // List all pods for this ShardedDatabase's StatefulSets to calculate ready shards
    	podList := &corev1.PodList{}
    	listOpts := []client.ListOption{
    		client.InNamespace(db.Namespace),
    		client.MatchingLabels{"app": "sharded-database", "db-instance": db.Name},
    	}
    	if err := r.List(ctx, podList, listOpts...); err != nil {
    		logger.Error(err, "Failed to list pods for status update")
    		return ctrl.Result{}, err
    	}
    
    	var readyShards int32
    	for _, pod := range podList.Items {
    		// A simple readiness check. Production checks would be more robust.
    		for _, cond := range pod.Status.Conditions {
    			if cond.Type == corev1.PodReady && cond.Status == corev1.ConditionTrue {
    				readyShards++
    			}
    		}
    	}
    
    	// Deep copy the status to avoid modifying the cache
    	newStatus := db.Status.DeepCopy()
    	newStatus.ReadyShards = readyShards
    
    	// Determine the overall phase
    	if newStatus.ReadyShards == db.Spec.Shards {
    		newStatus.Phase = "Ready"
    	} else {
    		newStatus.Phase = "Updating"
    	}
    
    	// Use reflect.DeepEqual to avoid unnecessary status updates
    	if !reflect.DeepEqual(db.Status, *newStatus) {
    		logger.Info("Updating ShardedDatabase status", "OldReady", db.Status.ReadyShards, "NewReady", newStatus.ReadyShards)
    		db.Status = *newStatus
    		if err := r.Status().Update(ctx, &db); err != nil {
    			if errors.IsConflict(err) {
    				// The object has been modified. Requeue to retry the whole loop.
    				logger.Info("Status update conflict, requeueing")
    				return ctrl.Result{Requeue: true}, nil
    			}
    			logger.Error(err, "Failed to update ShardedDatabase status")
    			return ctrl.Result{}, err
    		}
    	}
    
        return ctrl.Result{}, nil
    }
    

    Key takeaways from this status update logic:

  • r.Status().Update(...): We use the .Status() client, which specifically targets the /status subresource. This is the correct way to update status.
  • Conflict Handling: errors.IsConflict(err) is the canonical way to check for optimistic locking failures. The correct response is to requeue the entire request. The next reconciliation will fetch the newer version of the object and re-apply its logic, resolving the conflict naturally.
  • reflect.DeepEqual: We avoid writing to the API server if the status hasn't actually changed. This reduces API server load and noise in the audit logs.
  • Performance and Scalability Considerations

    A naive operator can overwhelm the Kubernetes API server. Here are advanced techniques to ensure your operator is a good citizen.

    * Controller Concurrency: By default, MaxConcurrentReconciles is 1. For CRDs with many instances, you may need to increase this in main.go. Be aware that this requires your reconciliation logic to be thread-safe regarding any external systems it communicates with.

    go
        // main.go
        mgr, err := ctrl.NewManager(ctrl.GetConfigOrDie(), ctrl.Options{
            // ...
        })
    
        if err = (&controller.ShardedDatabaseReconciler{
            // ...
        }).SetupWithManager(mgr, controller.Options{MaxConcurrentReconciles: 5}); err != nil {
            // ...
        }

    * Using Predicates to Filter Events: Your Reconcile loop doesn't need to run for every single change. For example, a change to a CR's metadata label that you don't care about shouldn't trigger a full reconciliation. Predicates can filter these events at the source.

    go
        // controllers/shardeddatabase_controller.go
        import "sigs.k8s.io/controller-runtime/pkg/predicate"
    
        func (r *ShardedDatabaseReconciler) SetupWithManager(mgr ctrl.Manager) error {
            return ctrl.NewControllerManagedBy(mgr).
                For(&dbv1alpha1.ShardedDatabase{}).
                Owns(&appsv1.StatefulSet{}).
                WithEventFilter(predicate.GenerationChangedPredicate{}).
                Complete(r)
        }

    The GenerationChangedPredicate is particularly useful. It ensures that the controller only reconciles when the .spec of the primary resource (or the spec of owned resources) changes. Status updates, which can be frequent, will not trigger a new reconciliation, preventing feedback loops.

    Conclusion

    Building a Kubernetes Operator is about more than just automating kubectl commands; it's about codifying deep, domain-specific operational knowledge into a self-healing, declarative system. We've moved beyond the basics to tackle the patterns essential for production readiness: ensuring clean, predictable resource deletion with finalizers; managing child resources idempotently; and implementing robust, conflict-aware status updates.

    By mastering these advanced techniques, you can extend Kubernetes to manage virtually any workload, no matter how complex. You are no longer just operating on the platform; you are building it, creating powerful, application-aware abstractions that empower your entire organization.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles