AI Coding Agents in Infrastructure Automation
Last reviewed: June 2026
Part 7 of 8 · Using AI in infrastructure work — the full series is listed at the end.
Part 6 was about agentic workflows in general. This part is about the agentic workflow most engineers meet first: coding agents — the tools that write and change the automation itself. AI coding agents can be genuinely useful in infrastructure work, but only inside a controlled engineering process. The agent can draft, refactor, test, and explain. The team still owns architecture, security, production risk, review, and the decision to release. Everything below is about holding that line while still getting the speed.
I'll keep this one more general than the rest of the series, because the shift it describes — from suggestion to delegated execution — is the same whether you're writing a web app or a Terraform module. Where it touches my own work, it's through infrastructure-as-code and automation: the parts of the job that are now genuinely faster, and genuinely more dangerous if you skip the review.
The five layers, and why the progression matters
The history of AI-assisted development isn't a list of model releases. It's a steady change in where the AI sits inside the delivery lifecycle, and each step trades capability for risk:
| Layer | Main capability | Workflow impact | Main risk |
|---|---|---|---|
| Autocomplete | Token and line completion | Less typing, faster boilerplate | Local suggestions look right but ignore broader behaviour |
| Chat assistants | Explanation, snippets, design discussion | Ask questions, learn unfamiliar code | Confident answers not grounded in the actual repo |
| Repo-aware assistants | Codebase search, multi-file edits | Helps with real project work, not toy examples | Architecture drift if context is incomplete |
| Coding agents | Plan, edit, run commands, read failures, iterate | An active collaborator inside the loop | Tool permissions, command safety, weak verification |
| Workflow agents | Background tasks, branches, pull requests, review loops | Delegate bounded work asynchronously | Reproducible environments, secrets, auditability, PR quality |
The reason this progression matters: it shows why older advice is incomplete. Prompting skill still helps, but it's no longer enough. A team using coding agents needs repository design, a testing strategy, security controls, a review workflow, and a clear definition of what the agent is allowed to do. The more autonomy the tool has, the more those things — not the prompt — decide whether you get useful work or fast-moving mess.
Choosing a tool by operating model, not brand
The product names change every few months. The durable distinction is the operating model — because choosing a tool is really choosing a feedback loop, a permission model, and a review surface:
| Operating model | Typical tools | Best fit | Review surface |
|---|---|---|---|
| Inline assistant | Editor autocomplete | Boilerplate, small transforms, API recall | Accept/reject while coding |
| Chat assistant | IDE chat, web chat | Explaining, planning, debugging guidance | Human verifies before applying |
| IDE agent | Cursor, Windsurf, Junie, Gemini Code Assist | Interactive multi-file changes, fast local feedback | Diffs, test output, approvals |
| Terminal agent | Codex CLI, Claude Code, Aider, Q Developer CLI | Repo-native work with shell, tests, git | Command output, git diff, commits |
| Cloud coding agent | Copilot coding agent, Codex cloud, Jules, Devin | Background issues, small features, PRs | Branch, pull request, CI checks |
| Spec-driven agentic IDE | Kiro and similar | Turning intent into specs, code, tests, docs | Specs, plan, generated changes |
| App-building agent | Replit Agent and similar | Prototypes, small internal tools, demos | Running app, deployment state |
The practical mistake this table prevents is using the wrong class of tool for the job. A terminal agent shines when the repository and commands matter; a cloud agent suits work that can wait and produce a pull request; an app builder is for a prototype, not a regulated production system. Fit matters more than hype — the best tool for a startup prototype is often wrong for a team with complex local infrastructure, and vice versa.
How a coding agent actually works
A coding agent is not a bigger chat window. In practice it's a loop around a model:
- Interpret the task — turn a prompt, issue, or instruction file into a working objective.
- Build context — search files, read docs, inspect dependency manifests, find the tests.
- Plan — decide the likely files, the order of changes, and the verification commands.
- Act — edit files, run commands, create branches, call tools.
- Observe — read compiler errors, test failures, logs, reviewer feedback.
- Iterate — patch until the task is done or genuinely blocked.
- Report — summarise changed files, decisions, risks, and verification evidence.
This is why the environment matters more than the model. An agent with no tests, no setup script, stale dependency instructions, and unclear conventions has to guess — and it guesses confidently. An agent with a reproducible environment, useful tests, and clear repository guidance can close real feedback loops and make measurable progress. The value isn't code generation; it's how much of the loop the agent can close before a human steps in. When it can't show what it inspected, what it changed, and how it verified the change, it isn't finished — regardless of how done it claims to be.
What context an agent actually needs
Teams try to fix poor agent output by writing longer prompts. That helps only so far. The better question is which context is missing:
| Context type | Examples | Why it matters |
|---|---|---|
| Repository structure | README, manifests, module boundaries | Stops the agent inventing a separate architecture |
| Local conventions | Style, test patterns, error handling, logging | Keeps generated code consistent with the system |
| Runtime evidence | Stack traces, logs, failing tests, curl output | Grounds the change in observed behaviour |
| Domain constraints | Security policy, compliance, product decisions | Prevents technically valid but business-wrong changes |
| Verification commands | Unit, integration, type, lint, smoke tests | Defines what "done" means |
| Forbidden changes | Generated files, vendor dirs, migrations, secrets | Controls blast radius |
Modern agents increasingly retrieve some of this themselves, but they still need a map. Repository instructions (an AGENTS.md or equivalent), task templates, and reliable scripts are how a team makes that map explicit — and the act of writing them down tends to improve the repo for the humans too. A weak instruction says "write clean code." A useful one says which command proves the backend, which folder is generated, and which migration pattern must be followed.
Where agents fit across the lifecycle
Coding agents are usually framed as implementation tools, but their useful surface is wider — if you assign the right level of authority at each stage:
| SDLC area | Good agent use | Human responsibility |
|---|---|---|
| Discovery | Summarise issues, inspect logs, map affected modules | Decide priority and direction |
| Requirements | Draft acceptance criteria, find edge cases | Approve scope, resolve ambiguity |
| Design | Propose options, compare trade-offs | Choose architecture, own consequences |
| Implementation | Bounded changes following local patterns | Review behaviour, security, maintainability |
| Testing | Add regression tests, fixtures, explain failures | Decide whether tests prove the right thing |
| Code review | Flag risky diffs, summarise behaviour changes | Approve or reject |
| Release | Draft release and rollback notes | Authorise deployment, monitor production |
The pattern is consistent: agents prepare work and reduce friction; humans own the decisions with product, security, financial, legal, or operational weight.
Delegating safely: the ticket is the interface
A good coding-agent task reads like an engineering ticket, not a prompt. It names the expected behaviour, the constraints, the likely files, the forbidden changes, the test commands, and the completion evidence:
Task:
Fix the duplicate notification bug when a user retries payment.
Scope:
- Payment retry flow only
- Do not change subscription state transitions
- Do not modify billing provider webhooks
Expected behavior:
- A retry creates at most one notification
- Existing successful-payment behavior is unchanged
- Failed retries remain visible in audit logs
Verification:
- Add or update a regression test
- Run the payment test subset
- Report changed files, test output, and any remaining risk
That format gives the agent room to work while keeping the blast radius visible, and it gives the reviewer a clear standard for accepting or rejecting the result. The most important distinction is investigation versus implementation: if the problem is unclear, start with read-only analysis ("inspect the failing flow and report likely causes; do not edit files"); if the behaviour is clear and testable, implementation delegation is far safer. Mechanical refactors, test-first bug fixes, feature slices in an established architecture, and documentation sync are the sweet spot. "Modernise the app," "improve security," and "deploy this automatically" are not — those need human architecture and risk control before an agent touches anything.
Review is the control point now
When AI produces more code, review becomes more important, not less — and AI-generated pull requests deserve a heavier review, not a lighter one, precisely because the author may not understand the system's intent. Generated code often looks clean; clean is not the same as correct under these constraints, with this production risk. My checklist for an AI-produced PR:
- Requirement fit: does the diff solve the real problem, or just the visible symptom?
- Scope control: any unrelated files, formatting churn, or sneaky broad refactors?
- Architecture fit: does it follow existing boundaries and helper APIs?
- Error behaviour: are failures explicit, logged, and recoverable?
- Data behaviour: migrations, defaults, nulls, retention, backward compatibility handled?
- Security: auth, authorization, input validation, secrets, dependency changes reviewed?
- Tests: do they assert behaviour, or were assertions weakened to pass?
- Operational impact: can it be deployed, observed, and rolled back?
- Evidence: did the agent report exact commands and results, not just "tests passed"?
A simple rule covers most of it: don't merge an agent-produced change you wouldn't accept from a human developer. And don't let the agent edit the tests freely — a model allowed to weaken assertions will "fix" the contract instead of the code.
Security and governance move earlier
A coding agent changes the threat model because it combines reading, reasoning, and acting. The review has to include the agent workflow itself, not only the code it produced:
| Risk | Example | Mitigation |
|---|---|---|
| Secret exposure | Agent reads env files or logs and sends them out | Keep secrets out of workspaces, scope credentials, mask logs |
| Prompt injection | Issue text or a web page tells the agent to ignore its rules | Treat external text as untrusted, keep system rules separate, review tool calls |
| Unsafe commands | Agent runs destructive shell commands | Command approval, sandboxing, allowlists, read-only defaults |
| Dependency risk | Agent adds packages without supply-chain review | Require dependency review and approved-package policy |
| Authorization regression | Agent adds an endpoint but misses role checks | Require security-path review and access-control tests |
| Data leak | Private data in logs, prompts, fixtures | Synthetic data, scrubbed logs, reviewed fixtures |
| Audit gap | Code or settings changed with no traceable evidence | Keep commits, PRs, command logs, verification notes |
None of this is paranoia. It's the same least-privilege engineering already applied to CI/CD, service accounts, and deployment automation — pointed at a new kind of actor that can read, reason, and execute in one motion.
Adopting agents without the mess
Treat adoption as an engineering change, not a tool rollout. The path that works: inventory which repositories actually have reliable setup, tests, and docs; start with low-risk task classes (docs, tests, small bug fixes); write repository instructions; define permission levels (read-only analysis, local edits, command execution, cloud PRs, external tools); require evidence with every result; measure how much rework agent output needs; and expand only after repeatable success. The common anti-patterns are predictable — vague delegation, no verification command, review-by-vibe, test weakening, architecture drift, hidden setup, unlimited tool access, one giant unsupervised task, accepting "done" with no proof. And measure outcomes, not activity: "AI wrote 40% of the code" proves nothing; time-to-reviewable-PR, percentage merged without major rework, review comments per PR, and post-merge defects do.
This favours teams with good hygiene. A repo with outdated docs, flaky tests, and unclear ownership will get fast-looking output of uncertain quality, because the agent exposes those weaknesses rather than fixing them. A repo with reliable setup, useful tests, and clear conventions gets a real multiplier.
The point
The evolution from autocomplete to coding agents is best understood as a move from assistance to delegated execution. The durable advantage isn't using AI everywhere — it's knowing where delegation is appropriate, how to constrain it, and how to verify the result. Software still ships under human responsibility; AI just changes how much work can be prepared before that responsibility is exercised. Which raises the obvious question: how do you verify delegated work rigorously? Part 8 answers it with the two practices that make AI-assisted automation trustworthy — spec-driven and test-driven development.
References
- GitHub Docs: About GitHub Copilot coding agent
- OpenAI Developers: Codex
- Anthropic Docs: Claude Code overview
- Google Jules Docs
- Cursor Docs: Agent overview
- Windsurf Docs: Cascade overview
- JetBrains Docs: Junie
- Kiro Docs
- Aider Documentation
- Devin Docs
- Replit Docs: Agent
- AWS Docs: Amazon Q Developer
- Google Developers: Gemini Code Assist agentic chat
- Model Context Protocol documentation
- OpenAI Codex: AGENTS.md guide
The full series
- How I Use AI Safely in Infrastructure Workflows
- Prompt Engineering for IT Infrastructure Consultants
- Advanced Prompting Patterns for Infrastructure Planning
- RAG for Enterprise Infrastructure Knowledge
- Context Engineering for Enterprise AI Systems
- Workflow Engineering for Agentic AI in Infrastructure Operations
- AI Coding Agents in Infrastructure Automation — (you're reading this)
- TDD + SDD for AI-Assisted Infrastructure Automation