ESC
← Back to blog

GitOps Is Not Just Git

· X min read
IaC Automation Operations
AI Summary

I've lost count of how many teams have told me they're "doing GitOps" when what they actually mean is "we put our Kubernetes manifests in a Git repository." They have YAML files checked into version control. They have a CI pipeline that runs kubectl apply when someone merges to main. They call this GitOps. It isn't.

This isn't pedantry. The distinction matters because the actual principles of GitOps -- the ones that make it genuinely valuable -- are precisely the parts these teams are missing. They've adopted the easiest 20% of the pattern and skipped the 80% that provides the operational guarantees.

The Four Principles

GitOps, as originally defined by Weaveworks and later formalized by the OpenGitOps project, rests on four principles. Understanding them -- really understanding them -- reveals why most "GitOps" implementations aren't.

Declarative. The entire desired state of the system is described declaratively. Not scripts that produce the state. Not imperative commands that transition to the state. A declaration of what the state should be. Kubernetes manifests, Terraform configurations, and Helm charts all qualify. Shell scripts that run kubectl commands do not.

Versioned and immutable. The desired state is stored in a way that enforces immutability and versioning, with a complete audit trail. Git provides this naturally -- every commit is immutable, every change is tracked, every state is recoverable. This is the easy part, and it's the part most teams get right.

Pulled automatically. Software agents automatically pull the desired state declarations from the source. This is where most implementations diverge. In true GitOps, the cluster pulls its desired state from Git. In most implementations, a CI pipeline pushes changes to the cluster. The direction of the arrow matters enormously.

Continuously reconciled. Software agents continuously observe actual system state and attempt to apply the desired state. If the actual state diverges from the desired state -- for any reason -- the agent corrects it. This is the principle that transforms GitOps from "version-controlled deployment" into an actual operational model.

Push vs. Pull: Why the Arrow Matters

Most CI/CD pipelines work on a push model. A developer merges code. The pipeline triggers. It builds artifacts. It runs kubectl apply or helm upgrade or terraform apply. The pipeline pushes changes to the target environment. This works. It's fine. But it's not GitOps.

The push model has a fundamental limitation: it only acts on events. When the pipeline runs, it applies the current state from Git. Between pipeline runs, nothing is watching. If someone runs a manual kubectl edit in production -- and someone always does, eventually -- the system state drifts from the Git state. Nobody knows until the next pipeline run, if then. The pipeline applies what's in Git, but it doesn't verify that what's running matches what's in Git. It assumes the delta is just the latest commit.

The pull model inverts this. An agent running inside the cluster -- ArgoCD, Flux, or similar -- continuously watches the Git repository. When the repository changes, the agent pulls the new desired state and applies it. But here's the critical difference: the agent also continuously watches the cluster. If the actual state diverges from the desired state, the agent corrects the divergence regardless of whether anyone committed anything to Git.

This means manual changes get reverted. If someone runs kubectl scale deployment/api --replicas=5 directly against the cluster, and the Git manifest says replicas should be 3, the agent scales it back to 3. The repository is the source of truth. Not the cluster. Not the last pipeline run. The repository. Always.

Reconciliation Loops

The reconciliation loop is the heart of GitOps and the part most implementations skip. A reconciliation loop continuously compares the desired state (what's in Git) with the actual state (what's running in the cluster) and takes corrective action when they diverge.

Think of it like a thermostat. You set the temperature to 72 degrees. The thermostat doesn't just turn on the furnace once and walk away. It continuously measures the actual temperature and takes action when it deviates from the target. It doesn't matter why the temperature changed -- someone opened a window, the sun went down, the insulation is poor. The thermostat doesn't care about the cause. It cares about the delta.

GitOps reconciliation works the same way. The agent doesn't care why the cluster state diverged from the Git state. Maybe someone ran a manual command. Maybe a controller modified a resource. Maybe a node failure caused pods to reschedule with different configurations. The agent sees the delta, consults the source of truth (Git), and corrects it.

This is fundamentally different from a CI pipeline that runs kubectl apply on merge. The pipeline is event-driven. The reconciliation loop is continuous. The pipeline handles planned changes. The reconciliation loop handles all changes -- planned, accidental, and malicious. The pipeline runs when you push code. The reconciliation loop runs all the time.

Drift Detection

Drift detection is the observability layer of GitOps. Even if you choose not to auto-remediate every drift -- and there are valid reasons not to -- you should know when your running system doesn't match your declared state.

In a push-only model, drift is invisible. You push a deployment on Monday. On Wednesday, someone manually patches a configmap in production to fix an urgent issue. On Friday, your monitoring shows everything is healthy. Nobody knows the running configuration doesn't match Git. The next deployment might overwrite the manual fix, reintroducing the issue. Or the manual fix might persist indefinitely, creating a divergence that surprises someone months later during a disaster recovery exercise.

Proper drift detection continuously compares the running state against the declared state and surfaces differences. ArgoCD's UI does this visually -- resources that have drifted from the Git definition are flagged. Flux generates events and alerts. Either approach gives you something the push model can't: confidence that what's running is what you think is running.

The value here extends beyond just catching manual changes. Drift detection also catches cases where controllers or operators modify resources in unexpected ways, where admission webhooks inject sidecars that don't match your manifests, or where resource limits are modified by vertical pod autoscalers. It's a continuous audit of your entire desired-state-to-actual-state relationship.

Why Most Implementations Miss the Point

The typical "GitOps" implementation looks like this: Kubernetes manifests live in a Git repository. A CI pipeline watches the repository. When changes are merged, the pipeline runs kubectl apply against the cluster. Maybe there's a staging environment that gets deployed first. Maybe there's an approval gate. The team calls this GitOps because the config is in Git and deployments are triggered by Git events.

What's missing? Everything that makes GitOps actually useful as an operational model.

This isn't bad infrastructure. CI/CD pipelines that deploy from Git are perfectly functional. They're better than manual deployments. They're version-controlled. They're auditable. They're just not GitOps. And that distinction matters because the benefits people attribute to GitOps -- drift detection, self-healing infrastructure, guaranteed state consistency -- only come from the parts they skipped.

The Role of Operators

Tools like ArgoCD and Flux exist specifically to implement the reconciliation loop. They're not deployment tools. They're state reconciliation engines that happen to deploy things as a side effect of reconciling desired state with actual state.

ArgoCD watches one or more Git repositories, compares the manifests in those repositories against what's running in the cluster, and surfaces any differences. It can auto-sync -- automatically applying changes when Git is updated. It can auto-prune -- removing resources that exist in the cluster but not in Git. It can auto-heal -- reverting manual changes that cause drift. Each of these capabilities maps to a core GitOps principle.

Flux takes a similar approach with a different architecture -- it runs as a set of controllers inside the cluster rather than as a separate application. It watches Git repositories, Helm repositories, and OCI registries. It reconciles on a configurable interval. It generates Kubernetes events when drift is detected. It integrates with notification systems to alert on reconciliation failures.

The key insight about both tools is that they run continuously. They don't fire and forget. They don't apply once and assume success. They continuously compare, detect, and correct. This is the operational model that makes GitOps valuable. Without it, you're just doing CI/CD with extra steps.

When GitOps Makes Sense

GitOps isn't universally appropriate. It works exceptionally well for Kubernetes-native workloads where the entire desired state can be expressed declaratively. It works well for infrastructure that's managed by controllers and operators. It works well in environments where multiple teams need to deploy to shared clusters with clear audit trails.

It works less well when your infrastructure includes significant imperative components -- database migrations, data backups, one-off administrative tasks. These don't fit neatly into a declarative model. You can work around it with Kubernetes Jobs and CronJobs, but you're fighting the abstraction.

It also works less well when your deployment requires complex orchestration -- deploying service A, waiting for it to be healthy, then deploying service B with a configuration that references service A. Traditional CI/CD pipelines handle this kind of sequential, conditional logic naturally. GitOps reconciliation is fundamentally about converging to a declared state, not orchestrating a sequence of steps.

The honest answer is that most organizations need both. GitOps for the steady-state management of infrastructure and applications. CI/CD for the build, test, and artifact creation pipeline that produces the images and manifests that GitOps deploys. They're complementary, not competing.

Getting It Right

If you want to actually implement GitOps -- not just version-controlled CI/CD with a fashionable label -- here's what it requires:

  1. Deploy a reconciliation agent. Install ArgoCD, Flux, or an equivalent in your cluster. Configure it to watch your Git repository. This is the non-negotiable foundation.
  2. Enable drift detection. Configure the agent to continuously compare desired state against actual state. Set up alerts for when drift is detected. Make drift visible to the team.
  3. Remove push-based deployment. Your CI pipeline should build images and update manifests in Git. It should not run kubectl apply. The agent handles deployment. The pipeline handles artifact creation.
  4. Lock down cluster access. If you have a reconciliation agent enforcing desired state, direct kubectl access to production becomes unnecessary for routine operations. Restrict it. Every manual change that bypasses Git undermines the model.
  5. Treat Git as the single source of truth. If it's not in Git, it shouldn't be in the cluster. If it's in the cluster but not in Git, the agent should flag or revert it. No exceptions.
GitOps is not a deployment strategy. It's an operational model where the Git repository is the single source of truth and software agents continuously reconcile actual state to match desired state. Everything else is just CI/CD with a Git trigger.

The teams getting real value from GitOps aren't the ones who moved their YAML into a repository. They're the ones who deployed reconciliation agents, enabled drift detection, and committed to the principle that the repository -- not the cluster, not the pipeline, not the last person who ran kubectl -- is the authoritative source of what should be running. That commitment is what separates a buzzword from an operational model.

Comments