GitOps at Scale: Multi-Cluster Management with ArgoCD ApplicationSets

16 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Scaling Wall of Single-Application GitOps

In any mature Kubernetes ecosystem, the initial success of GitOps using single ArgoCD Application resources eventually hits a scaling wall. The pattern is simple and effective for a handful of applications in a few clusters: a Git repository holds Kubernetes manifests, and an Application custom resource points to a path in that repository, syncing it to a target cluster. This works beautifully until your cluster count grows from 3 to 30, and your microservices from 10 to 100.

The resulting operational burden is immense. You're faced with a combinatorial explosion of Application manifests, leading to:

* YAML Duplication: A new staging cluster in ap-southeast-1? You're likely copying and pasting dozens of Application manifests, changing only the destination.server and a few parameters. This is a direct violation of the DRY (Don't Repeat Yourself) principle.

* Configuration Drift: When a new application needs to be onboarded to all 30 clusters, it's a tedious, error-prone process of creating 30 new Application manifests. It's easy to miss one, leading to inconsistent tooling and application availability across your fleet.

* Onboarding Friction: Adding a new cluster becomes a significant project, requiring a full audit of all applications that should be deployed to it, followed by a wave of manifest creation.

This is where the standard Application resource proves insufficient. It's designed to manage one application in one cluster. To operate at scale, we need a mechanism to manage applications across many clusters dynamically. We need a factory for Application resources. This is precisely the problem the ArgoCD ApplicationSet controller is designed to solve.

This article is a deep dive into using the ApplicationSet controller as the backbone of a scalable, multi-cluster GitOps strategy. We will not cover the basics of ArgoCD. We assume you are already running it and understand the core concepts of Applications, Projects, and sync strategies. We will focus exclusively on production patterns for fleet-wide application management.

The ApplicationSet Controller: A Factory for Applications

The ApplicationSet is a Kubernetes Custom Resource Definition (CRD) that acts as a template-driven generator for ArgoCD Application resources. Its core components are generators and a template.

* Generators: These produce a list of parameter sets. Each parameter set is used to render one Application resource. Generators can pull data from various sources: a static list, clusters registered in ArgoCD, or files/directories in a Git repository.

* Template: This is a standard ArgoCD Application spec, but with placeholders (e.g., {{cluster_name}}, {{path}}) that are filled in by the parameters from the generators.

The controller continuously evaluates the generators. If a new cluster is added or a new configuration file is pushed to Git, the ApplicationSet controller automatically generates a new Application resource. If a source is removed, the corresponding Application is automatically deleted.

Let's move beyond theory and into concrete, production-grade implementations.

Pattern 1: The Cluster Generator for Foundational Tooling

A common requirement is to deploy a standard set of foundational tools—such as an observability stack (Prometheus, Grafana), a security scanner (Trivy), or a policy engine (Kyverno)—to every single cluster in the fleet. The Cluster generator is purpose-built for this task.

It automatically discovers all Kubernetes clusters that ArgoCD is configured to manage. It does this by looking for secrets in the ArgoCD namespace (argocd) that have the label argocd.argoproj.io/secret-type: cluster.

Scenario: Deploy the kube-prometheus-stack Helm chart to every managed cluster.

First, let's define the ApplicationSet resource:

yaml
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: core-observability-stack
  namespace: argocd
spec:
  generators:
  - clusters: {}
  template:
    metadata:
      name: '{{name}}-prometheus-stack' # 'name' is the cluster name from the secret
      namespace: argocd
    spec:
      project: core-infrastructure
      source:
        repoURL: https://prometheus-community.github.io/helm-charts
        chart: kube-prometheus-stack
        targetRevision: 45.2.1
        helm:
          releaseName: prometheus
          values: |-
            grafana:
              adminPassword: "$argocd.common.grafana.adminPassword" # Example of using a secret
            prometheus:
              prometheusSpec:
                # Use cluster name to ensure unique external labels for a central Thanos/Cortex
                externalLabels:
                  cluster: '{{name}}'
      destination:
        server: '{{server}}' # 'server' is the API server URL from the cluster secret
        namespace: monitoring
      syncPolicy:
        automated:
          prune: true
          selfHeal: true
        syncOptions:
        - CreateNamespace=true

Analysis of this Advanced Implementation:

  • Dynamic Generation: The generators: - clusters: {} block is the key. It iterates over every registered cluster and provides name (e.g., prod-us-east-1) and server (e.g., https://1.2.3.4) as parameters to the template.
  • Unique Naming: The metadata.name: '{{name}}-prometheus-stack' ensures each generated Application has a unique name within the ArgoCD namespace, preventing collisions.
  • Contextual Configuration: The prometheusSpec.externalLabels section is a critical detail for multi-cluster observability. By injecting cluster: '{{name}}', we ensure that metrics scraped by this Prometheus instance are uniquely identifiable when federated into a central observability platform like Thanos or Grafana Mimir. This prevents metric collisions and allows for fleet-wide querying and alerting.
  • Secrets Management (Placeholder): The grafana.adminPassword value hints at a more complex secret management strategy, which we will detail later. Directly embedding secrets is an anti-pattern; here we assume a tool like argocd-vault-plugin is configured to replace this placeholder.
  • With this single ApplicationSet manifest, you have a system that automatically ensures every new cluster brought under ArgoCD's management will receive a correctly configured Prometheus stack without any manual intervention.

    Pattern 2: The Git Generator with Kustomize for Application Delivery

    While the Cluster generator is excellent for homogenous deployments, application delivery is rarely one-size-fits-all. A billing-service may need 10 replicas in production but only 1 in development. It will have different ingress hostnames, resource limits, and database connection strings per environment.

    This is where the Git generator, combined with Kustomize, provides a powerful and scalable solution. The strategy is to define your application's desired state in a Git repository, with a clear separation between base manifests and environment-specific overlays.

    The Git Repository Structure

    A robust structure is paramount. Consider the following layout for a microservice named billing-service:

    text
    apps/
    └── billing-service/
        ├── base/
        │   ├── deployment.yaml
        │   ├── service.yaml
        │   ├── ingress.yaml
        │   └── kustomization.yaml
        └── overlays/
            ├── dev/
            │   ├── kustomization.yaml
            │   ├── patch-replicas-resources.yaml
            │   └── configmap-patch.yaml
            ├── staging/
            │   ├── kustomization.yaml
            │   ├── patch-replicas-resources.yaml
            │   └── configmap-patch.yaml
            └── prod/
                ├── kustomization.yaml
                ├── patch-replicas-resources.yaml
                └── configmap-patch.yaml

    * base/: Contains the vanilla, environment-agnostic Kubernetes manifests. The deployment.yaml might specify 1 replica and have placeholders for resource limits.

    overlays/: Contains environment-specific patches. It does not* duplicate the entire set of manifests.

    Example: base/kustomization.yaml

    yaml
    apiVersion: kustomize.config.k8s.io/v1beta1
    kind: Kustomization
    resources:
    - deployment.yaml
    - service.yaml
    - ingress.yaml

    Example: overlays/prod/patch-replicas-resources.yaml

    yaml
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: billing-service
    spec:
      replicas: 20
      template:
        spec:
          containers:
          - name: server
            resources:
              requests:
                cpu: "1000m"
                memory: "2Gi"
              limits:
                cpu: "2000m"
                memory: "4Gi"

    Example: overlays/prod/kustomization.yaml

    yaml
    apiVersion: kustomize.config.k8s.io/v1beta1
    kind: Kustomization
    namespace: billing
    resources:
    - ../../base
    patchesStrategicMerge:
    - patch-replicas-resources.yaml
    - configmap-patch.yaml
    images:
    - name: my-docker-registry/billing-service
      newTag: "1.2.5" # Production image tag

    Driving Deployments with the Git Generator (files subtype)

    Now, we need a way to tell the ApplicationSet which overlay to deploy to which cluster. We use the Git generator's files subtype. It scans for specific files in a Git repository and uses their content to generate parameters.

    Let's create a configuration repository to define our deployment matrix:

    text
    cluster-config/
    ├── clusters/
    │   ├── dev.json
    │   ├── staging.json
    │   ├── prod-us-east-1.json
    │   └── prod-eu-west-1.json
    └── apps/
        └── billing-service.json

    cluster-config/clusters/prod-us-east-1.json:

    json
    {
      "name": "prod-us-east-1",
      "server": "https://1.2.3.4",
      "environment": "prod"
    }

    cluster-config/apps/billing-service.json:

    json
    {
      "appName": "billing-service",
      "project": "billing-team"
    }

    Now, the ApplicationSet can use a matrix generator to combine these definitions:

    yaml
    apiVersion: argoproj.io/v1alpha1
    kind: ApplicationSet
    metadata:
      name: billing-service-fleet
      namespace: argocd
    spec:
      generators:
      - matrix:
          generators:
          - git:
              repoURL: https://github.com/my-org/cluster-config.git
              revision: HEAD
              files:
              - path: "apps/billing-service.json"
          - git:
              repoURL: https://github.com/my-org/cluster-config.git
              revision: HEAD
              files:
              - path: "clusters/*.json"
      template:
        metadata:
          name: '{{name}}-{{appName}}' # e.g., prod-us-east-1-billing-service
        spec:
          project: '{{project}}'
          source:
            repoURL: https://github.com/my-org/apps.git
            targetRevision: HEAD
            path: '{{appName}}/overlays/{{environment}}' # DYNAMIC PATH!
          destination:
            server: '{{server}}'
            namespace: '{{appName}}'
          syncPolicy:
            automated:
              prune: true
              selfHeal: true
            syncOptions:
            - CreateNamespace=true

    Analysis of this Production Pattern:

  • Decoupling: The application code/manifests (apps repo) are decoupled from the deployment configuration (cluster-config repo). A platform team can manage clusters, while an application team manages their service's manifests.
  • Dynamic Kustomize Path: The line path: '{{appName}}/overlays/{{environment}}' is the core of this pattern. The ApplicationSet dynamically constructs the path to the correct Kustomize overlay based on the metadata from the cluster-config repo. Deploying to a new cluster is as simple as adding a new cluster.json file.
  • Matrix Generator: We combine two Git generators. One finds all applications to be deployed (billing-service.json), and the other finds all target clusters (clusters/*.json). The matrix generator creates a Cartesian product, ensuring the billing-service is templated for every defined cluster.
  • Scalability: Onboarding a new microservice (payment-service) involves creating a new directory in the apps repo and a payment-service.json in the cluster-config repo. No changes are needed to the ApplicationSet itself.
  • Advanced Edge Case: Secrets Management in GitOps

    Storing plain-text secrets in Git is a cardinal sin. The patterns above work for stateless configurations, but real applications need database credentials, API keys, and TLS certificates. The GitOps-native solution is to store encrypted secrets in Git and decrypt them inside the cluster.

    Pattern: External Secrets Operator (ESO)

    While Sealed Secrets is a viable option, the External Secrets Operator provides a more flexible and cloud-native approach. It works by creating a placeholder ExternalSecret manifest in Git. An in-cluster controller reads this manifest, fetches the actual secret data from an external provider (like AWS Secrets Manager, HashiCorp Vault, or GCP Secret Manager), and creates a native Kubernetes Secret resource from it.

    Implementation Steps:

  • Deploy ESO: Deploy the External Secrets Operator to all clusters (this is a perfect use case for the Cluster generator ApplicationSet we created earlier).
  • Create a SecretStore: In each cluster, create a SecretStore (or ClusterSecretStore) that configures access to your secrets backend (e.g., with an AWS IAM Role for Service Account - IRSA).
  • yaml
        apiVersion: external-secrets.io/v1beta1
        kind: ClusterSecretStore
        metadata:
          name: aws-secrets-manager
        spec:
          provider:
            aws:
              service: SecretsManager
              region: us-east-1
              auth:
                jwt:
                  serviceAccountRef:
                    name: external-secrets-sa # Assumes a SA with the correct IAM role
  • Commit ExternalSecret Manifests: In your application's Git repository (apps/billing-service/base), commit an ExternalSecret manifest instead of a Secret manifest.
  • yaml
        apiVersion: external-secrets.io/v1beta1
        kind: ExternalSecret
        metadata:
          name: billing-db-credentials
        spec:
          secretStoreRef:
            name: aws-secrets-manager
            kind: ClusterSecretStore
          target:
            name: billing-db-credentials # Name of the k8s Secret to be created
            creationPolicy: Owner
          data:
          - secretKey: username
            remoteRef:
              key: prod/billing-db/username # Path in AWS Secrets Manager
          - secretKey: password
            remoteRef:
              key: prod/billing-db/password

    Now, your Git repository contains no sensitive data. When ArgoCD syncs this manifest, the External Secrets Operator takes over, populating the billing-db-credentials secret with the live values from AWS Secrets Manager. Your Deployment can then mount this native Kubernetes Secret as usual.

    Performance and Reliability at Scale

    When an ApplicationSet generates hundreds or thousands of Application resources, the default ArgoCD installation can become a bottleneck. The argocd-application-controller is responsible for reconciling every one of these applications.

    Solution: Controller Sharding

    ArgoCD supports horizontal scaling of the application controller. You can enable sharding, which distributes the reconciliation load for a subset of clusters across different controller replicas.

    Sharding is enabled by modifying the argocd-cm ConfigMap:

    yaml
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: argocd-cm
      namespace: argocd
      labels:
        app.kubernetes.io/name: argocd-cm
        app.kubernetes.io/part-of: argocd
    data:
      application.controller.sharding.enabled: "true"

    Then, you scale the argocd-application-controller StatefulSet to the desired number of replicas (e.g., 3). ArgoCD uses a consistent hashing algorithm based on the cluster's API server URL to assign each cluster to a specific shard (controller pod). If a pod fails, its clusters are automatically redistributed among the remaining healthy pods.

    For a fleet of 30 clusters, running 3 controller replicas would mean each pod is responsible for reconciling applications for only 10 clusters, drastically reducing the reconciliation latency and improving the overall responsiveness and reliability of your GitOps system.

    Tying it all together with Progressive Delivery

    Finally, ApplicationSets manage what gets deployed, but not how. A standard deployment replaces all old pods with new ones at once, which is risky for production environments. To achieve progressive delivery (e.g., canary or blue-green deployments), we integrate Argo Rollouts.

    This is done within the application's manifest repository. Instead of a Deployment resource in apps/billing-service/base, you define a Rollout resource.

    apps/billing-service/base/rollout.yaml (excerpt):

    yaml
    apiVersion: argoproj.io/v1alpha1
    kind: Rollout
    metadata:
      name: billing-service
    spec:
      replicas: 10
      strategy:
        canary:
          steps:
          - setWeight: 10 # Send 10% of traffic to the new version
          - pause: { duration: 5m } # Wait 5 minutes for metrics to stabilize
          - setWeight: 50
          - pause: { duration: 10m }
          # ... and so on
      # ... selector, template, etc. are the same as a Deployment

    The ApplicationSet doesn't need to know about the Rollout. It continues to sync the manifests from the Git repository. When a developer updates the image tag in overlays/prod/kustomization.yaml and merges the change, ArgoCD syncs the Rollout resource. The Argo Rollouts controller then detects the spec change and orchestrates the complex, multi-step canary release, providing a much safer production deployment process.

    Conclusion: From Configuration Management to Platform Orchestration

    By graduating from individual Application resources to the ApplicationSet controller, you transform ArgoCD from a simple CI/CD tool into a true platform orchestration engine. The combination of dynamic Application generation, Kustomize overlays for environment-specific configuration, a robust secrets management strategy like ESO, and controller sharding for performance provides a comprehensive solution for managing complex, multi-cluster Kubernetes environments at scale.

    This architecture enforces consistency, eliminates manual toil, and empowers application teams with a self-service model for deployments, all while maintaining the core GitOps principles of an auditable, version-controlled source of truth for your entire system.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles