How I Use AI Safely in Infrastructure Workflows
Last reviewed: June 2026
Part 1 of 8 · Using AI in infrastructure work — the full series is listed at the end.
I sit on both sides of this problem. Most weeks I use AI assistants on client infrastructure — drafting migration checklists, reviewing PowerShell before it runs against Active Directory, turning the noise of a discovery workshop into something a project team can act on. I also build AI systems: a retrieval platform I forked from RAGFlow and rebuilt around vision models, a RAG assistant for a narrow domain, document-extraction pipelines that have to be right often enough to be useful. Doing both is exactly why I don't let a model near a production decision without verification. I have watched these systems, from the inside, produce a fluent and completely unsupported answer — and I know how convincing it looks on the way out.
This post is the foundation for the rest of the series. Before we get to prompts, retrieval, agents, and AI-assisted coding, it's worth being precise about where AI earns its place in infrastructure work, where it has to stop, and the verification habits that decide which of those two you get. Twenty years of infrastructure delivery taught me the controls. The last couple of years building AI taught me exactly where those controls have to bite.
Why a model produces confident, wrong answers
A language model predicts plausible text. When it has the right context in front of it, plausible and correct line up. When the context is missing, stale, incomplete, or ambiguous, they come apart — and the model gives you the plausible version in the same calm, authoritative tone it uses when it's right. That uniform confidence is the actual hazard. A junior engineer who is unsure usually sounds unsure. A model rarely does.
In infrastructure work the gaps are everywhere a model is weakest. Ask whether a specific Citrix feature has been deprecated, and a general-purpose model answers from training data that may predate the last few product releases. Ask how a particular Microsoft 365 license entitlement behaves, and you'll get a confident paragraph that was true for some tenant, on some date, under some licensing SKU — just maybe not yours. Ask it to reason about your Active Directory and it will happily invent a plausible OU structure it has never seen. None of this is the model "lying." It is pattern-matching without the current, specific source it would need to be right.
The blast radius is what makes this matter. A wrong answer about licensing, Citrix feature status, firewall design, AD cleanup, or PowerShell behaviour doesn't stay on the screen. It becomes a change request, a script, a cutover step — something with real consequences for real users.
What that looks like in practice: verifying one claim
Here is the discipline made concrete. Suppose I'm planning a Citrix migration and I ask a model whether a particular policy or feature is still supported in the current release. It tells me, confidently, that it was removed two versions ago. Before that sentence influences a single line of my plan, I do four things:
- Make it show its work. I ask the model to state the source and the version its answer applies to. "Removed two versions ago" with no version number is not an answer I can use.
- Open the vendor documentation myself. Citrix, Microsoft, and VMware change behaviour and naming constantly — Azure AD became Microsoft Entra ID, tools get renamed or pulled from installation media. The current product page is the authority, not the model's memory.
- Check the date. A correct-sounding answer about a product that shipped a change last quarter is exactly the case the model gets wrong.
- Decide, and own the decision. Once I've confirmed it against the source, the call is mine. If the source is ambiguous, that ambiguity goes into the plan as an open question with an owner — it doesn't get smoothed over.
That whole loop takes a few minutes, and it is the difference between AI accelerating my work and AI quietly injecting a false premise into a production migration.
What I let AI do, and what I own
I treat AI as a drafting, review, and analysis assistant — never as an authority for a production decision. The model can summarise a discovery workshop, draft a CAB request, review a script for missing safety controls, or compare two architecture options against criteria I set. The engineer or technical lead still owns the decision, the verification, and the production risk.
The boundary is clearest in a migration. On a large multi-domain consolidation, a model is genuinely useful for structuring the dependency analysis — clustering applications, drafting the wave plan, spotting a missing rollback step in my own notes. What it cannot do is tell me which service account will break authentication at cutover, because that fact lives in the environment, not in any training set. That is discovery work, and discovery work has an owner with a name. AI changes how fast I can prepare the analysis; it does not change who is accountable for it.
Where I do not trust AI without checking
Some categories are simply too expensive to get wrong, so they always get verified against a real source before they go anywhere:
- Microsoft licensing and entitlement details — they change often and cost real money.
- Current Citrix feature status or deprecations — release-dependent, frequently wrong.
- Firewall and network security designs — a plausible-looking rule can open a real hole.
- Active Directory group and account cleanup — the blast radius is identity-wide.
- Production PowerShell changes — confident code is still unreviewed code.
- Incident root-cause conclusions — a tidy narrative is not the same as the cause.
- Security recommendations — generic advice can be actively wrong for your context.
- Legal or compliance statements — out of scope for a model, full stop.
Where it genuinely helps
The same tool, pointed at lower-risk work, saves real time every day:
- Drafting migration and discovery checklists.
- Summarising discovery workshops into RAID logs.
- Reviewing scripts for safety controls (
-WhatIf, validation, logging, rollback). - Preparing CAB drafts from rough engineering notes.
- Comparing architecture options against explicit criteria.
- Producing a first-pass document I then correct.
- Surfacing the rollback or validation step I forgot.
The pattern across that list: AI is at its best when the output is a draft that a human will review anyway, and at its worst when the output is treated as a final answer no one checks.
Choosing the model for the task
"Which model?" is less important than "which model, for which task, with what grounding," but the choice still matters. The way I think about it:
- Fast, general models for low-stakes drafting and summarising, where speed beats depth and I'm going to review the output regardless.
- Stronger reasoning models for option comparisons, risk analysis, and reviewing a script's logic, where the quality of the reasoning is the whole point.
- Retrieval-grounded setups whenever current product facts matter — the model should be answering from documentation I've given it, not from memory. Most of the second half of this series is about doing that well.
- Local or enterprise-hosted models when the input is sensitive enough that it must not leave a controlled boundary. For some client work, "which model is smartest" is the wrong question; "which model can I legally and safely put this data into" comes first.
What never goes into a prompt
Working from Germany, under GDPR, this is not optional and it is not abstract. A consumer chatbot is a third-party data processor. Before anything goes into a prompt I ask whether I'd be comfortable emailing it to an outside vendor — because functionally that's what I'm doing.
So: no secrets, credentials, private keys, or connection strings. No real personal data — names, UPNs, sign-in logs, anything that identifies a person — unless I'm using a tool with the right contractual and technical controls and a lawful basis to do so. No client-confidential architecture where the engagement doesn't allow it. When I need a model to reason over a real artifact, I sanitise it first: placeholder accounts, contoso-style domains, redacted identifiers. The reasoning is the same; the exposure isn't. This single habit prevents the most common and most damaging AI mistake in consulting — solving the technical problem while creating a data-protection one.
Model choice by risk
| Task | AI use | Control |
|---|---|---|
| Workshop summary | Useful | Review against the original notes. |
| CAB draft | Useful | Engineer validates scope, risk, rollback, timing. |
| PowerShell review | Useful | Human validates code and tests. |
| Vendor feature decision | Only with retrieval | Check current vendor documentation. |
| Handling personal/identity data | Only with controls | Lawful basis, approved tooling, or sanitise first. |
| Production remediation | Do not delegate | Human approval and change control required. |
The rule is simple: the closer a task gets to production, identity, or security, the less autonomy the model gets and the more the answer has to be grounded in a source I can open.
A team policy that fits on one page
AI may draft, summarise, review, and compare. AI must not approve production changes, replace vendor documentation, bypass change control, or execute privileged actions without human approval. I also want teams to record when AI was used to produce a production-adjacent artifact — a change request, a script, a risk log, a runbook — so the provenance is visible later. None of that needs a committee. It needs to be written down once, so the team isn't re-deciding it during an incident.
Practical checklist
- Use approved, current sources wherever current facts matter.
- Require citations for vendor-specific claims, and open them yourself.
- Preserve uncertainty instead of forcing a confident answer.
- Never paste secrets, credentials, or unsanitised personal data into a prompt.
- Keep AI-generated scripts out of production until a human has reviewed them.
- Match the model and hosting to the sensitivity of the data, not just the difficulty of the task.
- Route production-impacting decisions through change control.
- Document human approval for high-risk work.
The point
Use AI where it improves thinking, review, and documentation speed. Stop where authority, accountability, or production risk begins. After two decades of infrastructure delivery and a couple of years building AI systems myself, my honest conclusion is that safe AI use is less about trusting a model and more about the discipline of verification around it. Everything that follows is that discipline made concrete — starting with the prompt itself, in Part 2.
The full series
- How I Use AI Safely in Infrastructure Workflows — (you're reading this)
- Prompt Engineering for IT Infrastructure Consultants
- Advanced Prompting Patterns for Infrastructure Planning
- RAG for Enterprise Infrastructure Knowledge
- Context Engineering for Enterprise AI Systems
- Workflow Engineering for Agentic AI in Infrastructure Operations
- AI Coding Agents in Infrastructure Automation
- TDD + SDD for AI-Assisted Infrastructure Automation