AI Coding Agents in Infrastructure Automation - WordPad

AI Coding Agents in Infrastructure Automation

Last reviewed: June 2026
Part 7 of 8 · Using AI in infrastructure work — the full series is listed at the end.

Part 6 was about agentic workflows in general. This part is about the agentic workflow most engineers meet first: coding agents — the tools that write and change the automation itself. AI coding agents can be genuinely useful in infrastructure work, but only inside a controlled engineering process. The agent can draft, refactor, test, and explain. The team still owns architecture, security, production risk, review, and the decision to release. Everything below is about holding that line while still getting the speed.

I'll keep this one more general than the rest of the series, because the shift it describes — from suggestion to delegated execution — is the same whether you're writing a web app or a Terraform module. Where it touches my own work, it's through infrastructure-as-code and automation: the parts of the job that are now genuinely faster, and genuinely more dangerous if you skip the review.

The five layers, and why the progression matters

The history of AI-assisted development isn't a list of model releases. It's a steady change in where the AI sits inside the delivery lifecycle, and each step trades capability for risk:

Layer Main capability Workflow impact Main risk
Autocomplete Token and line completion Less typing, faster boilerplate Local suggestions look right but ignore broader behaviour
Chat assistants Explanation, snippets, design discussion Ask questions, learn unfamiliar code Confident answers not grounded in the actual repo
Repo-aware assistants Codebase search, multi-file edits Helps with real project work, not toy examples Architecture drift if context is incomplete
Coding agents Plan, edit, run commands, read failures, iterate An active collaborator inside the loop Tool permissions, command safety, weak verification
Workflow agents Background tasks, branches, pull requests, review loops Delegate bounded work asynchronously Reproducible environments, secrets, auditability, PR quality

The reason this progression matters: it shows why older advice is incomplete. Prompting skill still helps, but it's no longer enough. A team using coding agents needs repository design, a testing strategy, security controls, a review workflow, and a clear definition of what the agent is allowed to do. The more autonomy the tool has, the more those things — not the prompt — decide whether you get useful work or fast-moving mess.

Choosing a tool by operating model, not brand

The product names change every few months. The durable distinction is the operating model — because choosing a tool is really choosing a feedback loop, a permission model, and a review surface:

Operating model Typical tools Best fit Review surface
Inline assistant Editor autocomplete Boilerplate, small transforms, API recall Accept/reject while coding
Chat assistant IDE chat, web chat Explaining, planning, debugging guidance Human verifies before applying
IDE agent Cursor, Windsurf, Junie, Gemini Code Assist Interactive multi-file changes, fast local feedback Diffs, test output, approvals
Terminal agent Codex CLI, Claude Code, Aider, Q Developer CLI Repo-native work with shell, tests, git Command output, git diff, commits
Cloud coding agent Copilot coding agent, Codex cloud, Jules, Devin Background issues, small features, PRs Branch, pull request, CI checks
Spec-driven agentic IDE Kiro and similar Turning intent into specs, code, tests, docs Specs, plan, generated changes
App-building agent Replit Agent and similar Prototypes, small internal tools, demos Running app, deployment state

The practical mistake this table prevents is using the wrong class of tool for the job. A terminal agent shines when the repository and commands matter; a cloud agent suits work that can wait and produce a pull request; an app builder is for a prototype, not a regulated production system. Fit matters more than hype — the best tool for a startup prototype is often wrong for a team with complex local infrastructure, and vice versa.

How a coding agent actually works

A coding agent is not a bigger chat window. In practice it's a loop around a model:

  1. Interpret the task — turn a prompt, issue, or instruction file into a working objective.
  2. Build context — search files, read docs, inspect dependency manifests, find the tests.
  3. Plan — decide the likely files, the order of changes, and the verification commands.
  4. Act — edit files, run commands, create branches, call tools.
  5. Observe — read compiler errors, test failures, logs, reviewer feedback.
  6. Iterate — patch until the task is done or genuinely blocked.
  7. Report — summarise changed files, decisions, risks, and verification evidence.

This is why the environment matters more than the model. An agent with no tests, no setup script, stale dependency instructions, and unclear conventions has to guess — and it guesses confidently. An agent with a reproducible environment, useful tests, and clear repository guidance can close real feedback loops and make measurable progress. The value isn't code generation; it's how much of the loop the agent can close before a human steps in. When it can't show what it inspected, what it changed, and how it verified the change, it isn't finished — regardless of how done it claims to be.

What context an agent actually needs

Teams try to fix poor agent output by writing longer prompts. That helps only so far. The better question is which context is missing:

Context type Examples Why it matters
Repository structure README, manifests, module boundaries Stops the agent inventing a separate architecture
Local conventions Style, test patterns, error handling, logging Keeps generated code consistent with the system
Runtime evidence Stack traces, logs, failing tests, curl output Grounds the change in observed behaviour
Domain constraints Security policy, compliance, product decisions Prevents technically valid but business-wrong changes
Verification commands Unit, integration, type, lint, smoke tests Defines what "done" means
Forbidden changes Generated files, vendor dirs, migrations, secrets Controls blast radius

Modern agents increasingly retrieve some of this themselves, but they still need a map. Repository instructions (an AGENTS.md or equivalent), task templates, and reliable scripts are how a team makes that map explicit — and the act of writing them down tends to improve the repo for the humans too. A weak instruction says "write clean code." A useful one says which command proves the backend, which folder is generated, and which migration pattern must be followed.

Where agents fit across the lifecycle

Coding agents are usually framed as implementation tools, but their useful surface is wider — if you assign the right level of authority at each stage:

SDLC area Good agent use Human responsibility
Discovery Summarise issues, inspect logs, map affected modules Decide priority and direction
Requirements Draft acceptance criteria, find edge cases Approve scope, resolve ambiguity
Design Propose options, compare trade-offs Choose architecture, own consequences
Implementation Bounded changes following local patterns Review behaviour, security, maintainability
Testing Add regression tests, fixtures, explain failures Decide whether tests prove the right thing
Code review Flag risky diffs, summarise behaviour changes Approve or reject
Release Draft release and rollback notes Authorise deployment, monitor production

The pattern is consistent: agents prepare work and reduce friction; humans own the decisions with product, security, financial, legal, or operational weight.

Delegating safely: the ticket is the interface

A good coding-agent task reads like an engineering ticket, not a prompt. It names the expected behaviour, the constraints, the likely files, the forbidden changes, the test commands, and the completion evidence:

Task:
Fix the duplicate notification bug when a user retries payment.

Scope:
- Payment retry flow only
- Do not change subscription state transitions
- Do not modify billing provider webhooks

Expected behavior:
- A retry creates at most one notification
- Existing successful-payment behavior is unchanged
- Failed retries remain visible in audit logs

Verification:
- Add or update a regression test
- Run the payment test subset
- Report changed files, test output, and any remaining risk

That format gives the agent room to work while keeping the blast radius visible, and it gives the reviewer a clear standard for accepting or rejecting the result. The most important distinction is investigation versus implementation: if the problem is unclear, start with read-only analysis ("inspect the failing flow and report likely causes; do not edit files"); if the behaviour is clear and testable, implementation delegation is far safer. Mechanical refactors, test-first bug fixes, feature slices in an established architecture, and documentation sync are the sweet spot. "Modernise the app," "improve security," and "deploy this automatically" are not — those need human architecture and risk control before an agent touches anything.

Review is the control point now

When AI produces more code, review becomes more important, not less — and AI-generated pull requests deserve a heavier review, not a lighter one, precisely because the author may not understand the system's intent. Generated code often looks clean; clean is not the same as correct under these constraints, with this production risk. My checklist for an AI-produced PR:

  • Requirement fit: does the diff solve the real problem, or just the visible symptom?
  • Scope control: any unrelated files, formatting churn, or sneaky broad refactors?
  • Architecture fit: does it follow existing boundaries and helper APIs?
  • Error behaviour: are failures explicit, logged, and recoverable?
  • Data behaviour: migrations, defaults, nulls, retention, backward compatibility handled?
  • Security: auth, authorization, input validation, secrets, dependency changes reviewed?
  • Tests: do they assert behaviour, or were assertions weakened to pass?
  • Operational impact: can it be deployed, observed, and rolled back?
  • Evidence: did the agent report exact commands and results, not just "tests passed"?

A simple rule covers most of it: don't merge an agent-produced change you wouldn't accept from a human developer. And don't let the agent edit the tests freely — a model allowed to weaken assertions will "fix" the contract instead of the code.

Security and governance move earlier

A coding agent changes the threat model because it combines reading, reasoning, and acting. The review has to include the agent workflow itself, not only the code it produced:

Risk Example Mitigation
Secret exposure Agent reads env files or logs and sends them out Keep secrets out of workspaces, scope credentials, mask logs
Prompt injection Issue text or a web page tells the agent to ignore its rules Treat external text as untrusted, keep system rules separate, review tool calls
Unsafe commands Agent runs destructive shell commands Command approval, sandboxing, allowlists, read-only defaults
Dependency risk Agent adds packages without supply-chain review Require dependency review and approved-package policy
Authorization regression Agent adds an endpoint but misses role checks Require security-path review and access-control tests
Data leak Private data in logs, prompts, fixtures Synthetic data, scrubbed logs, reviewed fixtures
Audit gap Code or settings changed with no traceable evidence Keep commits, PRs, command logs, verification notes

None of this is paranoia. It's the same least-privilege engineering already applied to CI/CD, service accounts, and deployment automation — pointed at a new kind of actor that can read, reason, and execute in one motion.

Adopting agents without the mess

Treat adoption as an engineering change, not a tool rollout. The path that works: inventory which repositories actually have reliable setup, tests, and docs; start with low-risk task classes (docs, tests, small bug fixes); write repository instructions; define permission levels (read-only analysis, local edits, command execution, cloud PRs, external tools); require evidence with every result; measure how much rework agent output needs; and expand only after repeatable success. The common anti-patterns are predictable — vague delegation, no verification command, review-by-vibe, test weakening, architecture drift, hidden setup, unlimited tool access, one giant unsupervised task, accepting "done" with no proof. And measure outcomes, not activity: "AI wrote 40% of the code" proves nothing; time-to-reviewable-PR, percentage merged without major rework, review comments per PR, and post-merge defects do.

This favours teams with good hygiene. A repo with outdated docs, flaky tests, and unclear ownership will get fast-looking output of uncertain quality, because the agent exposes those weaknesses rather than fixing them. A repo with reliable setup, useful tests, and clear conventions gets a real multiplier.

The point

The evolution from autocomplete to coding agents is best understood as a move from assistance to delegated execution. The durable advantage isn't using AI everywhere — it's knowing where delegation is appropriate, how to constrain it, and how to verify the result. Software still ships under human responsibility; AI just changes how much work can be prepared before that responsibility is exercised. Which raises the obvious question: how do you verify delegated work rigorously? Part 8 answers it with the two practices that make AI-assisted automation trustworthy — spec-driven and test-driven development.

References


The full series

  1. How I Use AI Safely in Infrastructure Workflows
  2. Prompt Engineering for IT Infrastructure Consultants
  3. Advanced Prompting Patterns for Infrastructure Planning
  4. RAG for Enterprise Infrastructure Knowledge
  5. Context Engineering for Enterprise AI Systems
  6. Workflow Engineering for Agentic AI in Infrastructure Operations
  7. AI Coding Agents in Infrastructure Automation(you're reading this)
  8. TDD + SDD for AI-Assisted Infrastructure Automation
For Help, press F1 2190 words Ln 1, Col 1