<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://blog.rhyscazenove.com/feed.xml" rel="self" type="application/atom+xml" /><link href="https://blog.rhyscazenove.com/" rel="alternate" type="text/html" /><updated>2026-04-09T11:44:15+00:00</updated><id>https://blog.rhyscazenove.com/feed.xml</id><title type="html">Rhys Cazenove</title><subtitle>Software engineer based in London. Building digital products and platforms since 2001.</subtitle><author><name>Rhys Cazenove</name></author><entry><title type="html">Agents in the Pipeline: from a skill that works locally to a service you can trust</title><link href="https://blog.rhyscazenove.com/2026/04/09/agents-in-the-pipeline/" rel="alternate" type="text/html" title="Agents in the Pipeline: from a skill that works locally to a service you can trust" /><published>2026-04-09T00:00:00+00:00</published><updated>2026-04-09T00:00:00+00:00</updated><id>https://blog.rhyscazenove.com/2026/04/09/agents-in-the-pipeline</id><content type="html" xml:base="https://blog.rhyscazenove.com/2026/04/09/agents-in-the-pipeline/"><![CDATA[<p>Last night I gave a talk at <a href="https://cccl.dev">CCCL #5</a> in London, hosted by <a href="https://www.linkedin.com/in/vikrammpawar/">Vikram Pawar</a>, Claude Code Community Leader, and <a href="https://www.linkedin.com/in/roberthartuk/">Rob Hart</a>, founder of GitNation. The <a href="https://cccl-ai.github.io/meetups-live/cccl-5b/agenda">full agenda is available here</a>. You can also <a href="/presentations/cccl-april-2026.html">view the slides</a> directly.</p>

<p>Other talks by Jan Peer, Ruslan Zavacky, Daniel Buchele, Valera Latsho, Aris Mandor and Talha Sheikh were all fascinating. I love learning about where people are at in their AI journey. Some impressive and pioneering demos.</p>

<p>I was impressed with the audience, and enjoyed meeting some of the community. They are very engaged and had some very good follow-up questions for me. It was their enthusiasm which prompted me to launch this blog!</p>

<p>This is a write-up of what I covered. The short version: Claude Code skills are easy to get running locally. Making them trustworthy enough to run unattended in production is a different problem, and one worth solving well.</p>

<hr />

<h2 id="the-starting-point">The starting point</h2>

<p>If you’ve been building with Claude Code, you’ve probably reached this point. You’ve written a skill that does something useful: a documentation generator, a log investigator, a code reviewer. It works well. You’ve iterated on it. You trust it.</p>

<p>Now you want to run it as a service.</p>

<p>Running locally and running as a service look similar but are different in the ways that matter:</p>

<table>
  <thead>
    <tr>
      <th>Running locally</th>
      <th>Running as a service</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>You control it.</td>
      <td>Runs unattended.</td>
    </tr>
    <tr>
      <td>Agent can explore.</td>
      <td>No exploration.</td>
    </tr>
    <tr>
      <td>Human always in the loop.</td>
      <td>No human oversight.</td>
    </tr>
    <tr>
      <td>Failures are visible.</td>
      <td>Drift is silent.</td>
    </tr>
  </tbody>
</table>

<p>When an agent runs on a schedule and something goes wrong (a model update changes output format, a dependency drifts), nobody sees it in the moment. By the time you notice, you might have a week of bad output, or worse.</p>

<hr />

<h2 id="three-pillars">Three pillars</h2>

<p>Getting a Claude Code agent safely into production requires three things to be in place simultaneously.</p>

<p>Guardrails: constrain what the agent can do at runtime. Block dangerous commands, confine it to its problem space, prevent unconventional tooling. The agent should only be able to do the things you’ve explicitly decided it should be able to do.</p>

<p>Confinement: an isolated, reproducible environment. No side effects, no state leakage, no dependency drift between runs. Every execution should start from a known state.</p>

<p>Observability: visibility into everything the agent does, including what it’s <em>allowed</em> to do, not just what gets blocked. This is how you verify it hasn’t drifted, and how you keep the guardrails current.</p>

<hr />

<h2 id="simon-willisons-lethal-trifecta">Simon Willison’s Lethal Trifecta</h2>

<p>The risk model we’re designing against is Simon Willison’s Lethal Trifecta: three things that are each fine in isolation but dangerous together.</p>

<ol>
  <li>Access to private or sensitive data</li>
  <li>Exposure to untrusted content</li>
  <li>A mechanism to exfiltrate data</li>
</ol>

<p>Any one alone is manageable. All three together is the problem — untrusted content can inject instructions that use the exfiltration mechanism to leak private data. The framework we’ve built at NHM protects against many vectors of this, but it is not a complete guarantee. The design principle is to break the flow across multiple agents so no single agent ever holds all three.</p>

<hr />

<h2 id="the-framework-four-parts">The framework: four parts</h2>

<p>Our implementation uses GitLab CI, a Docker container, and Claude Code hooks. It’s four separable parts, each with clear ownership.</p>

<h3 id="container-image">Container Image</h3>

<p>The foundation. You decide exactly which libraries, frameworks, and tools are available to the agent. Nothing else. Built from a Dockerfile, cached in GitLab’s container registry for reuse across every pipeline run. No dependency drift, no surprises. The container <em>is</em> the confinement.</p>

<h3 id="security-hooks">Security Hooks</h3>

<p>Claude Code’s hook system fires before and after every tool call. We use this for two things: security (PreToolUse validation that can block calls before they execute) and observability (PostToolUse metrics capture on every completed call).</p>

<p>A useful pattern we’ve found: configure your hooks to log every <em>rejected</em> call. Review these regularly to understand what the agent tried to do. Then add or remove tools from the container accordingly. The rejected log is one of the most informative signals in the whole system.</p>

<h3 id="pipeline-config">Pipeline Config</h3>

<p>GitLab CI orchestration with manual triggers, timeouts, artifact retention and audit compliance. The pipeline is the runtime harness. It sets the boundaries within which the container and agent operate.</p>

<h3 id="ownership-model">Ownership Model</h3>

<p>This is the one that often gets skipped. Security shouldn’t own the skills; domain experts shouldn’t own the hooks. Each team owns what they understand: Software Engineering builds the skills, Infrastructure manages the container, Security writes the hooks, Domain Experts handle acceptance testing. No single team is a bottleneck, and no single team has to understand the whole system.</p>

<hr />

<h2 id="7-independent-protection-layers">7 independent protection layers</h2>

<p>Within this framework, we’ve implemented seven layers of defence, each providing independent protection:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>         AGENT
    ─── action hooks ───
    L1 — Command blocklist
   L2 — Path traversal guard
  L3 — Network egress control
 L4 — Credential pattern block
    ── input guard ──
  L5 — Prompt injection guard
      ─ container ─
   L6 — Container isolation
        ─ audit ─
     L7 — Audit logging
</code></pre></div></div>

<p>The architecture mirrors defence in depth from traditional security: an attacker (or misbehaving agent) has to defeat each layer independently. L1 to L4 run as action hooks; L5 as an input guard; L6 is the container itself; L7 is continuous logging of everything.</p>

<p>The layers are designed to be independent so that a failure in one doesn’t cascade. L6 would contain something that escaped L1 through L5. L7 would capture evidence of anything that reached L6.</p>

<hr />

<h2 id="what-weve-built-with-it">What we’ve built with it</h2>

<p>Three use cases in production at NHM:</p>

<p>A documentation generator scans git history, groups changes by theme, and generates Architecture Decision Records and Mermaid architecture diagrams. It runs incrementally on merge.</p>

<p>An onboarding generator creates full developer onboarding documentation from a codebase (app overview, architecture guide, getting-started guide, troubleshooting). Triggered on demand.</p>

<p>An incident analysis agent fires via webhook when error rates exceed a threshold. It uses Azure MCP to analyse the issue and prepare evidence, including deep links to KQL queries and charts in Azure Monitor, so the engineer assigned to investigate has a running start before deciding the best course of action.</p>

<p>All three use the same harness. The hooks, container, and pipeline config don’t change between them. Only the skill itself changes.</p>

<hr />

<h2 id="what-we-learned">What we learned</h2>

<p>Get the harness right before you scale use cases. Adding skills to a working harness is far easier than retrofitting governance onto skills that are already running.</p>

<p>Modular ownership matters. Security shouldn’t own the skills; domain experts shouldn’t own the hooks. When ownership is clear, each team can iterate on their layer without stepping on others.</p>

<p>Hook architecture gives you observability for free. The same pattern that blocks also emits metrics. You don’t need a separate observability pipeline; the hooks are already there.</p>

<p>Start boring. Documentation and auditing are perfect low-risk pilots. The agent has read-only access to git history, there’s no sensitive data in play, and the output is easy for humans to verify. Build confidence in the framework before expanding scope.</p>

<p>AI analyses, humans approve and refine. The goal is to give humans better information faster so they can make better decisions.</p>

<hr />

<h2 id="the-practical-architecture">The practical architecture</h2>

<p>For those who want the technical specifics: we’re running on Azure with GitLab CI as the orchestration layer. The container registry is GitLab’s built-in registry. Hooks are bash scripts. Azure MCP is used for the Incident Analysis use case to query Application Insights.</p>

<p>The whole thing is platform-agnostic by design. The hooks are bash or PowerShell, the container runs anywhere, and the pipeline config is YAML.</p>

<hr />

<h2 id="slides">Slides</h2>

<p>The slides from the talk are available <a href="/presentations/cccl-april-2026.html">here</a>.</p>

<p>If you’re building something similar or thinking through the governance model for your own agentic workflows, I’m happy to talk through it — find me on <a href="https://linkedin.com/in/rhyscazenove">LinkedIn</a>.</p>

<hr />

<p><em>Rhys Cazenove is AI Lead at the Natural History Museum, South Kensington</em></p>]]></content><author><name>Rhys Cazenove</name></author><category term="ai" /><category term="engineering" /><category term="claude-code" /><category term="agents" /><category term="gitlab-ci" /><category term="azure" /><category term="governance" /><category term="security" /><summary type="html"><![CDATA[A write-up of my talk at CCCL #5 in London — how we built a production governance harness for agentic AI workflows at the Natural History Museum.]]></summary></entry></feed>