GitOps at Scale: Multi-Cluster Management with ArgoCD ApplicationSets
The Scaling Wall of Single-Application GitOps
In any mature Kubernetes ecosystem, the initial success of GitOps using single ArgoCD Application resources eventually hits a scaling wall. The pattern is simple and effective for a handful of applications in a few clusters: a Git repository holds Kubernetes manifests, and an Application custom resource points to a path in that repository, syncing it to a target cluster. This works beautifully until your cluster count grows from 3 to 30, and your microservices from 10 to 100.
The resulting operational burden is immense. You're faced with a combinatorial explosion of Application manifests, leading to:
* YAML Duplication: A new staging cluster in ap-southeast-1? You're likely copying and pasting dozens of Application manifests, changing only the destination.server and a few parameters. This is a direct violation of the DRY (Don't Repeat Yourself) principle.
* Configuration Drift: When a new application needs to be onboarded to all 30 clusters, it's a tedious, error-prone process of creating 30 new Application manifests. It's easy to miss one, leading to inconsistent tooling and application availability across your fleet.
* Onboarding Friction: Adding a new cluster becomes a significant project, requiring a full audit of all applications that should be deployed to it, followed by a wave of manifest creation.
This is where the standard Application resource proves insufficient. It's designed to manage one application in one cluster. To operate at scale, we need a mechanism to manage applications across many clusters dynamically. We need a factory for Application resources. This is precisely the problem the ArgoCD ApplicationSet controller is designed to solve.
This article is a deep dive into using the ApplicationSet controller as the backbone of a scalable, multi-cluster GitOps strategy. We will not cover the basics of ArgoCD. We assume you are already running it and understand the core concepts of Applications, Projects, and sync strategies. We will focus exclusively on production patterns for fleet-wide application management.
The ApplicationSet Controller: A Factory for Applications
The ApplicationSet is a Kubernetes Custom Resource Definition (CRD) that acts as a template-driven generator for ArgoCD Application resources. Its core components are generators and a template.
* Generators: These produce a list of parameter sets. Each parameter set is used to render one Application resource. Generators can pull data from various sources: a static list, clusters registered in ArgoCD, or files/directories in a Git repository.
* Template: This is a standard ArgoCD Application spec, but with placeholders (e.g., {{cluster_name}}, {{path}}) that are filled in by the parameters from the generators.
The controller continuously evaluates the generators. If a new cluster is added or a new configuration file is pushed to Git, the ApplicationSet controller automatically generates a new Application resource. If a source is removed, the corresponding Application is automatically deleted.
Let's move beyond theory and into concrete, production-grade implementations.
Pattern 1: The Cluster Generator for Foundational Tooling
A common requirement is to deploy a standard set of foundational tools—such as an observability stack (Prometheus, Grafana), a security scanner (Trivy), or a policy engine (Kyverno)—to every single cluster in the fleet. The Cluster generator is purpose-built for this task.
It automatically discovers all Kubernetes clusters that ArgoCD is configured to manage. It does this by looking for secrets in the ArgoCD namespace (argocd) that have the label argocd.argoproj.io/secret-type: cluster.
Scenario: Deploy the kube-prometheus-stack Helm chart to every managed cluster.
First, let's define the ApplicationSet resource:
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: core-observability-stack
namespace: argocd
spec:
generators:
- clusters: {}
template:
metadata:
name: '{{name}}-prometheus-stack' # 'name' is the cluster name from the secret
namespace: argocd
spec:
project: core-infrastructure
source:
repoURL: https://prometheus-community.github.io/helm-charts
chart: kube-prometheus-stack
targetRevision: 45.2.1
helm:
releaseName: prometheus
values: |-
grafana:
adminPassword: "$argocd.common.grafana.adminPassword" # Example of using a secret
prometheus:
prometheusSpec:
# Use cluster name to ensure unique external labels for a central Thanos/Cortex
externalLabels:
cluster: '{{name}}'
destination:
server: '{{server}}' # 'server' is the API server URL from the cluster secret
namespace: monitoring
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
Analysis of this Advanced Implementation:
generators: - clusters: {} block is the key. It iterates over every registered cluster and provides name (e.g., prod-us-east-1) and server (e.g., https://1.2.3.4) as parameters to the template.metadata.name: '{{name}}-prometheus-stack' ensures each generated Application has a unique name within the ArgoCD namespace, preventing collisions.prometheusSpec.externalLabels section is a critical detail for multi-cluster observability. By injecting cluster: '{{name}}', we ensure that metrics scraped by this Prometheus instance are uniquely identifiable when federated into a central observability platform like Thanos or Grafana Mimir. This prevents metric collisions and allows for fleet-wide querying and alerting.grafana.adminPassword value hints at a more complex secret management strategy, which we will detail later. Directly embedding secrets is an anti-pattern; here we assume a tool like argocd-vault-plugin is configured to replace this placeholder.With this single ApplicationSet manifest, you have a system that automatically ensures every new cluster brought under ArgoCD's management will receive a correctly configured Prometheus stack without any manual intervention.
Pattern 2: The Git Generator with Kustomize for Application Delivery
While the Cluster generator is excellent for homogenous deployments, application delivery is rarely one-size-fits-all. A billing-service may need 10 replicas in production but only 1 in development. It will have different ingress hostnames, resource limits, and database connection strings per environment.
This is where the Git generator, combined with Kustomize, provides a powerful and scalable solution. The strategy is to define your application's desired state in a Git repository, with a clear separation between base manifests and environment-specific overlays.
The Git Repository Structure
A robust structure is paramount. Consider the following layout for a microservice named billing-service:
apps/
└── billing-service/
├── base/
│ ├── deployment.yaml
│ ├── service.yaml
│ ├── ingress.yaml
│ └── kustomization.yaml
└── overlays/
├── dev/
│ ├── kustomization.yaml
│ ├── patch-replicas-resources.yaml
│ └── configmap-patch.yaml
├── staging/
│ ├── kustomization.yaml
│ ├── patch-replicas-resources.yaml
│ └── configmap-patch.yaml
└── prod/
├── kustomization.yaml
├── patch-replicas-resources.yaml
└── configmap-patch.yaml
* base/: Contains the vanilla, environment-agnostic Kubernetes manifests. The deployment.yaml might specify 1 replica and have placeholders for resource limits.
overlays/: Contains environment-specific patches. It does not* duplicate the entire set of manifests.
Example: base/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- deployment.yaml
- service.yaml
- ingress.yaml
Example: overlays/prod/patch-replicas-resources.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: billing-service
spec:
replicas: 20
template:
spec:
containers:
- name: server
resources:
requests:
cpu: "1000m"
memory: "2Gi"
limits:
cpu: "2000m"
memory: "4Gi"
Example: overlays/prod/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: billing
resources:
- ../../base
patchesStrategicMerge:
- patch-replicas-resources.yaml
- configmap-patch.yaml
images:
- name: my-docker-registry/billing-service
newTag: "1.2.5" # Production image tag
Driving Deployments with the Git Generator (files subtype)
Now, we need a way to tell the ApplicationSet which overlay to deploy to which cluster. We use the Git generator's files subtype. It scans for specific files in a Git repository and uses their content to generate parameters.
Let's create a configuration repository to define our deployment matrix:
cluster-config/
├── clusters/
│ ├── dev.json
│ ├── staging.json
│ ├── prod-us-east-1.json
│ └── prod-eu-west-1.json
└── apps/
└── billing-service.json
cluster-config/clusters/prod-us-east-1.json:
{
"name": "prod-us-east-1",
"server": "https://1.2.3.4",
"environment": "prod"
}
cluster-config/apps/billing-service.json:
{
"appName": "billing-service",
"project": "billing-team"
}
Now, the ApplicationSet can use a matrix generator to combine these definitions:
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: billing-service-fleet
namespace: argocd
spec:
generators:
- matrix:
generators:
- git:
repoURL: https://github.com/my-org/cluster-config.git
revision: HEAD
files:
- path: "apps/billing-service.json"
- git:
repoURL: https://github.com/my-org/cluster-config.git
revision: HEAD
files:
- path: "clusters/*.json"
template:
metadata:
name: '{{name}}-{{appName}}' # e.g., prod-us-east-1-billing-service
spec:
project: '{{project}}'
source:
repoURL: https://github.com/my-org/apps.git
targetRevision: HEAD
path: '{{appName}}/overlays/{{environment}}' # DYNAMIC PATH!
destination:
server: '{{server}}'
namespace: '{{appName}}'
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
Analysis of this Production Pattern:
apps repo) are decoupled from the deployment configuration (cluster-config repo). A platform team can manage clusters, while an application team manages their service's manifests.path: '{{appName}}/overlays/{{environment}}' is the core of this pattern. The ApplicationSet dynamically constructs the path to the correct Kustomize overlay based on the metadata from the cluster-config repo. Deploying to a new cluster is as simple as adding a new cluster.json file.Git generators. One finds all applications to be deployed (billing-service.json), and the other finds all target clusters (clusters/*.json). The matrix generator creates a Cartesian product, ensuring the billing-service is templated for every defined cluster.payment-service) involves creating a new directory in the apps repo and a payment-service.json in the cluster-config repo. No changes are needed to the ApplicationSet itself.Advanced Edge Case: Secrets Management in GitOps
Storing plain-text secrets in Git is a cardinal sin. The patterns above work for stateless configurations, but real applications need database credentials, API keys, and TLS certificates. The GitOps-native solution is to store encrypted secrets in Git and decrypt them inside the cluster.
Pattern: External Secrets Operator (ESO)
While Sealed Secrets is a viable option, the External Secrets Operator provides a more flexible and cloud-native approach. It works by creating a placeholder ExternalSecret manifest in Git. An in-cluster controller reads this manifest, fetches the actual secret data from an external provider (like AWS Secrets Manager, HashiCorp Vault, or GCP Secret Manager), and creates a native Kubernetes Secret resource from it.
Implementation Steps:
Cluster generator ApplicationSet we created earlier).SecretStore: In each cluster, create a SecretStore (or ClusterSecretStore) that configures access to your secrets backend (e.g., with an AWS IAM Role for Service Account - IRSA). apiVersion: external-secrets.io/v1beta1
kind: ClusterSecretStore
metadata:
name: aws-secrets-manager
spec:
provider:
aws:
service: SecretsManager
region: us-east-1
auth:
jwt:
serviceAccountRef:
name: external-secrets-sa # Assumes a SA with the correct IAM role
ExternalSecret Manifests: In your application's Git repository (apps/billing-service/base), commit an ExternalSecret manifest instead of a Secret manifest. apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: billing-db-credentials
spec:
secretStoreRef:
name: aws-secrets-manager
kind: ClusterSecretStore
target:
name: billing-db-credentials # Name of the k8s Secret to be created
creationPolicy: Owner
data:
- secretKey: username
remoteRef:
key: prod/billing-db/username # Path in AWS Secrets Manager
- secretKey: password
remoteRef:
key: prod/billing-db/password
Now, your Git repository contains no sensitive data. When ArgoCD syncs this manifest, the External Secrets Operator takes over, populating the billing-db-credentials secret with the live values from AWS Secrets Manager. Your Deployment can then mount this native Kubernetes Secret as usual.
Performance and Reliability at Scale
When an ApplicationSet generates hundreds or thousands of Application resources, the default ArgoCD installation can become a bottleneck. The argocd-application-controller is responsible for reconciling every one of these applications.
Solution: Controller Sharding
ArgoCD supports horizontal scaling of the application controller. You can enable sharding, which distributes the reconciliation load for a subset of clusters across different controller replicas.
Sharding is enabled by modifying the argocd-cm ConfigMap:
apiVersion: v1
kind: ConfigMap
metadata:
name: argocd-cm
namespace: argocd
labels:
app.kubernetes.io/name: argocd-cm
app.kubernetes.io/part-of: argocd
data:
application.controller.sharding.enabled: "true"
Then, you scale the argocd-application-controller StatefulSet to the desired number of replicas (e.g., 3). ArgoCD uses a consistent hashing algorithm based on the cluster's API server URL to assign each cluster to a specific shard (controller pod). If a pod fails, its clusters are automatically redistributed among the remaining healthy pods.
For a fleet of 30 clusters, running 3 controller replicas would mean each pod is responsible for reconciling applications for only 10 clusters, drastically reducing the reconciliation latency and improving the overall responsiveness and reliability of your GitOps system.
Tying it all together with Progressive Delivery
Finally, ApplicationSets manage what gets deployed, but not how. A standard deployment replaces all old pods with new ones at once, which is risky for production environments. To achieve progressive delivery (e.g., canary or blue-green deployments), we integrate Argo Rollouts.
This is done within the application's manifest repository. Instead of a Deployment resource in apps/billing-service/base, you define a Rollout resource.
apps/billing-service/base/rollout.yaml (excerpt):
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: billing-service
spec:
replicas: 10
strategy:
canary:
steps:
- setWeight: 10 # Send 10% of traffic to the new version
- pause: { duration: 5m } # Wait 5 minutes for metrics to stabilize
- setWeight: 50
- pause: { duration: 10m }
# ... and so on
# ... selector, template, etc. are the same as a Deployment
The ApplicationSet doesn't need to know about the Rollout. It continues to sync the manifests from the Git repository. When a developer updates the image tag in overlays/prod/kustomization.yaml and merges the change, ArgoCD syncs the Rollout resource. The Argo Rollouts controller then detects the spec change and orchestrates the complex, multi-step canary release, providing a much safer production deployment process.
Conclusion: From Configuration Management to Platform Orchestration
By graduating from individual Application resources to the ApplicationSet controller, you transform ArgoCD from a simple CI/CD tool into a true platform orchestration engine. The combination of dynamic Application generation, Kustomize overlays for environment-specific configuration, a robust secrets management strategy like ESO, and controller sharding for performance provides a comprehensive solution for managing complex, multi-cluster Kubernetes environments at scale.
This architecture enforces consistency, eliminates manual toil, and empowers application teams with a self-service model for deployments, all while maintaining the core GitOps principles of an auditable, version-controlled source of truth for your entire system.