Migrating four products to ArgoCD without downtime

When I joined NovumDMS the deployment story was four products, four conventions and zero traceability. Here's how we made Git the single source of truth — and what I'd do differently.

Every team I’ve joined has its own version of the same problem: deployments that work, but only the person who built them knows why. At NovumDMS that meant an AgriTech/IoT platform, a multi-tenant SaaS ERP layer and a distribution ERP — each with its own scripts, its own conventions, and no shared notion of “what is actually running in production right now.”

The goal wasn’t ArgoCD for its own sake. It was answering three questions instantly, for any environment: what’s deployed, who changed it, and how do we roll it back? GitOps gives you all three for free, as long as you respect the model.

Start with the repository layout, not the tool

The most important decision happens before you install anything. We separated application code from deployment manifests — app repos build and push images; a dedicated config repo describes desired cluster state. ArgoCD watches the config repo and reconciles.

# config repo — one source of truth per env
clusters/
  agritech-prod/   apps/  values/
  erp-saas-prod/   apps/  values/
  distrib-prod/    apps/  values/
  _base/             # shared kustomize bases

The _base directory is where the real win lives. Four products that used to drift apart now share Helm/Kustomize bases — so a change to, say, the standard probe configuration happens once.

App-of-apps, so onboarding is one PR

Rather than registering each application by hand, we used the app-of-apps pattern: a single root Application points at a directory of child Applications. Adding a service to an environment becomes a one-file pull request, reviewed like any other change.

If adding a service to production isn’t a reviewable pull request, you don’t have GitOps — you have a wiki page that lies.

Migrating without downtime

The trick to a zero-downtime cutover is that ArgoCD can adopt resources it didn’t create. The sequence we ran for each service:

Export the live manifests and commit them exactly as they are to the config repo.
Create the ArgoCD Application in manual sync mode and let it report Synced / Healthy against the running workload — no changes applied.
Diff obsessively. If ArgoCD wants to change anything, the committed manifests are wrong, not the cluster.
Only once the diff is empty do you flip to automated sync with self-heal.

Note: Resist the urge to “clean up” manifests during the migration. Adopt first, refactor later — in a separate PR you can actually reason about.

Secrets without leaking them into Git

GitOps and secrets feel contradictory until you encrypt at rest. We used SOPS + AGE: secrets live in the repo encrypted, decrypted in-cluster at apply time. Auditable, reviewable, and no plaintext ever touches a commit.

What I’d do differently

Define sync windows earlier. Self-heal is great until it fights a 2am hotfix. Agree on windows before, not after.
Invest in notifications on day one. A failed sync that nobody sees is worse than the manual process you replaced.
Treat the config repo’s CODEOWNERS as production access control. It is.

Three months in, “what’s running in prod?” is answered by a git log instead of a Slack thread. That’s the whole point.