Blogs /
Building Alfred: An AI agent for modern engineering teams
Engineering

Building Alfred: An AI agent for modern engineering teams

June 26, 2026
4 minutes
How Nscale built an AI software engineering agent designed for control, quality, and operating infrastructure at scale.

As AI models become increasingly capable, generating code is no longer the hard part. The challenge is making engineering judgement repeatable. The model is only one part of the equation. The real value lies in the harness around it: the workflows, guardrails, standards, and feedback loops that transform AI from a capable assistant into a reliable engineering system.

As an AI-native company, we believe the builders of AI infrastructure should also be among its largest users. At the scale at which Nscale operates, software cannot become the bottleneck to deployment. The systems responsible for building, validating, and operating infrastructure have to move as quickly as the infrastructure itself. 

That's why we built Alfred, our internal software engineering agent.

Alfred helps our engineers build the tests, tooling, and diagnostic workflows that enable that pace of deployment. But speed alone isn’t enough. The code has to be accurate, with no failure mode treated as too unlikely to consider.

For example, one recent investigation began with a cluster-wide training job running slower than expected. Using observability and diagnostic workflows developed with Alfred, engineers narrowed the issue from more than 1,000 nodes to three anomalous machines. The affected systems were isolated and replaced while the investigation continued. The root cause turned out to be surprisingly simple: loose screws securing GPU modules. After the hardware was retorqued, performance returned to expected levels and the nodes were returned to service.

“At this scale, software engineering isn't just about building software. It's about building systems that can identify, diagnose, and recover from failures anywhere in the stack. A cluster-wide performance issue can ultimately trace back to something as simple as a loose screw.”

Antony Cleave

Senior HPC Developer, Nscale

Your harness is the differentiator

Building Alfred came down to a question every engineering organisation eventually faces: when does owning the tooling matter more than buying it?

Off-the-shelf AI coding tools such as Codex and Claude Code are powerful force multipliers and remain part of our engineers' workflows. The challenge is ensuring project-specific principles are applied consistently and in context without engineers having to restate them on every task. System prompts and configuration files can provide structure, but they aren’t enough on their own.

Rather than relying on a single model, Alfred combines specialist agents with a purpose-built harness that identifies gaps, challenges assumptions, and drives iterative improvement. That harness includes workflow orchestration, specialist reviewers, human feedback loops, and codified engineering standards that embed Nscale's way of working directly into the development process. The result is consistent, high-quality outcomes at scale. 

For us, the build-versus-buy decision was clear. It's the same principle of sovereignty that underpins everything we build at Nscale.

“You don't have to keep reviewing the same architectural mistakes over and over. You teach Alfred the standard once, and he applies it consistently across every change.”

Eyal Lantzman

VP, Head of AI Solutions, Nscale

How Alfred moves engineers up the value chain 

In practice, Alfred's impact shows up in two places: the work that gets done, and the problems that never reach production.

1. Engineers move up the value chain

Well-defined, self-contained work that would otherwise sit in a backlog indefinitely is now delegated to Alfred end-to-end. Engineers spend less time on repetitive implementation work and more time on architecture, business context, and the harder problems that require human judgement.

2. Quality is built in

Operating at Nscale's scale means designing for failure, and planning for it. Alfred's role is to identify those failure modes early, before they become production incidents.

In one repository alone, over a ten-week period, Alfred reviewed 365 pull requests and identified 454 Priority 1 issues. 94% were confirmed as genuine issues requiring remediation. This reliability comes directly from Alfred's architecture. Because generation and review are performed independently, the same blind spots don't simply pass through the system unchecked. Alfred's role is not to rubber-stamp code, but to act as an independent reviewer that continuously challenges assumptions before changes are shipped.

At Nscale, speed and quality are both customer outcomes, not competing priorities. Alfred is one of the ways we deliver on both.

Building an architecture of trust

If the harness is the differentiator, then trust becomes the requirement. Rather than treating AI agents as autonomous black boxes, we built Alfred to be inspectable, recoverable, and controllable at every step. Engineers can delegate work confidently because they always retain visibility and control.

When an engineer delegates a task, it enters a defined workflow. The Task Manager picks it up, launches a Temporal workflow, and provisions Alfred in a Kubernetes-sandboxed runner, an isolated environment where Alfred performs its work. 

Alfred's system architecture, showing the full request lifecycle from human interfaces through Task Manager, Temporal workflows, and Kubernetes-sandboxed runners to GitHub.

Linear serves as the system of record and collaboration surface. It’s already where engineers work, so when engineers delegate tickets to Alfred, ownership and the audit trail remain tied to the engineer who delegated the work, ensuring clear accountability. Every action Alfred takes is automatically surfaced in Linear through real-time updates using Linear Agent Interactions, eliminating the need for additional tooling or context switching. Alfred can also ask questions directly on the ticket when requirements are unclear or underspecified, enabling engineers to provide clarification and preventing Alfred from making incorrect assumptions.

Temporal acts as Alfred's workflow orchestration and durable execution engine. It manages Alfred's long-running tasks and maintains workflow state throughout execution, enabling reliable recovery from failures. When a task is interrupted, execution resumes from the last successful step rather than restarting the workflow. Temporal also provides granular control over failure handling, including safeguards to detect and terminate runaway processes before they consume excessive resources or cause downstream disruption.

The Kubernetes sandbox provides Alfred's secure execution environment. Alfred always runs as a non-root user, and outbound network traffic is strictly limited to GitHub and Nscale's inference endpoints. Inside the sandbox, Alfred operates through a sub-agent model. Specialist agents, including bug hunters and architecture reviewers, work independently on narrow responsibilities. 

Alfred also supports project-specific customisation. Each repository can define its own operating context through AGENTS.md, custom skills, enforcement rules, and auto-approval policies based on change size and risk. Alfred picks up these settings automatically, allowing teams to tailor behaviour on a per-project basis without requiring any Alfred-specific configuration.

MCP servers manage the communication layer between Alfred and the engineers working alongside it. They allow Alfred to report progress in real time, pause workflows pending human input, and route information between systems and people. This human-in-the-loop capability is one of Alfred's core design principles: when critical context is missing, Alfred surfaces uncertainty and seeks guidance.

Every action is logged, and only Task Manager can write back to Linear. This creates a trustworthy audit trail where prompts, agent sessions, and human approvals all pass through a single controlled entry point.

The specific tools matter less than the principles they embody. Alfred uses Linear, Temporal, Kubernetes, and MCP servers, but teams running Jira, Celery, VMs, or different orchestration layers could apply the same architecture. What's non-negotiable is the pattern: a system of record that owns accountability, a durable workflow state that survives failures, a hard security boundary around the agent, and a communication layer that keeps humans in the loop.

The practices we need to evolve 

The first version of Alfred generated too many false positives. Not because the approach was wrong, but because the right configuration had to be discovered through experimentation. Prompt changes, new skills, and eventually the sub-agent structure all emerged through repeated testing rather than upfront design.

That forced a shift in how we think about building with AI. We began treating prompts, workflows, and agent structures as hypotheses, the same way ML teams iterate on models: making assumptions explicit, testing them quickly, and removing anything that failed under real-world conditions.

The deeper insight is that many standard engineering practices, including review processes, handoffs, and quality gates, were designed for a world where iteration was expensive and slow. Agentic systems change that equation. Treating existing processes as starting points rather than fixed constraints has become a competitive advantage.

“There are so many schools of thought about how to run software teams, but AI changes the equation. Co-development with AI is still in its earliest stages, and the practices that work today may not be the ones that work tomorrow. The biggest gains won't come from bolting AI onto existing processes, but from continuously experimenting with new workflows, new tooling, and new patterns of human-AI collaboration.”

Eyal Lantzman

VP, Head of AI Solutions, Nscale

Closing the loop: from code to operations

Every iteration of Alfred expands what the next version can do, moving from code execution to operations and ultimately toward leadership judgment. Today, Alfred helps write, review, and ship software. The next frontier is closing the feedback loop in production, where execution failures are most costly and operational burden is highest.

Currently, Alfred has no access to logs, alerts, or monitoring data. Connecting it to operational systems is the next step, but our goal is not simply to automate existing incident management processes. AI-first operations will likely require entirely new workflows. Rather than fitting Alfred into today's practices, we want to explore what incident detection, investigation, and remediation look like when AI is an active participant from the outset.

Scaling Alfred's reach is also a question of token economics. Alfred was designed so that the underlying model can be swapped for anything exposed through an OpenAI-compliant API, which means the architecture already supports running multiple models simultaneously. Each brings different reasoning styles and strengths to the same problem, just like different engineers would. Intelligent routing then balances capability, cost, and latency for each task.

Longer term, our ambition is to move Alfred beyond execution and towards engineering leadership. We're exploring a "tech lead" sub-agent that codifies how experienced technical leaders reason under pressure, translating years of hard-won judgment into operational principles that can be applied consistently, scaled across teams, and taught to agents.

Building Alfred has convinced us that this isn't just a better developer tool, but a different operating model for engineering.

Every team shipping AI infrastructure will eventually reach the same point: the challenge stops being access to AI and starts becoming how effectively it is operationalised.

That's the advantage of being AI-native. Teams that treat AI systems as products to be engineered, measured, and continuously improved will compound their advantage with every iteration. Teams that treat AI as a tool to bolt onto existing workflows will spend their time trying to catch up.

Blog Contents

Tom Matthews

Staff AI Engineer

Tom is an AI engineer specialising in scalable AI infrastructure, production systems, and MLOps. He is passionate about designing reliable, end-to-end AI platforms that deliver real business value.

Explore More

Models made AI famous. Infrastructure decides who wins

Europe's false choice for AI compute: Why geography, like Portugal's, holds the answer

The shift to AI-native infrastructure

AI services without the cost trade-offs

Access thousands of GPUs tailored to your needs

Reserve GPUs