2026-04-27·5 min read

5 Patterns I Use to Ship Production AI Agents in TypeScript

Five reliable patterns for shipping production AI agents in TypeScript: prompt design, tool schemas, server-side execution, retries, and observability.

production AI agent patterns TypeScriptAI agent tool schema designOpenAI agent retry observabilityserver-side AI agent architecture ReactTypeScript LLM agent best practices

5 Patterns I Use to Ship Production AI Agents in TypeScript

Building production AI agents in TypeScript involves more than just connecting a large language model (LLM) to a user interface. These systems call an LLM, take structured action via tools, run server-side, and reliably handle errors and retries. It's a world apart from a chat UI demo.

In this post, I'll walk you through the five patterns I reach for whenever I'm tasked with shipping something that real users depend on, like OpenAI tool calling, MCP (Model Context Protocol), and TypeScript-first validation libraries such as Zod and valibot. Let’s dive right in.

Why "Patterns" Beat "Frameworks" for AI Agents

LangChain-style frameworks abstract too aggressively for production agents where every retry, every tool call, and every error path matters. Hand-rolled patterns in TypeScript give you precise control over types, errors, observability, and bundle size.

A few years back, I dabbled with a framework that promised to handle everything. It worked for a demo, but when it came to production, the lack of control over error handling and retries was a nightmare. I went back to hand-rolled code and haven't looked back since.

Pattern 1: Anchored System Prompts

The system prompt encapsulates the agent's personality, capabilities, and constraints in one document. Avoid inlining prompts as string literals at the call site. Instead, store them in versioned files, such as src/agents/<agent-name>/prompt.md, and import them with ?raw in Vite. This ensures prompts get reviewed in pull requests.

Anchoring the voice with concrete style references is crucial. For the Blogger Agent, this means quoting two of my existing posts to help the model imitate the cadence. Here's a simple setup for calling OpenAI's responses API:

import systemPrompt from './prompt.md?raw';
 
const messages = [
  { role: "system", content: systemPrompt },
  { role: "user", content: "Write a blog post about..." },
];
 
const response = await openai.chat.completions.create({
  model: "gpt-4o",
  messages,
});

Pitfall: prompts that drift over time without commits become impossible to debug. Treat them like code.

Pattern 2: Strict Tool Schemas with Zod

Models can hallucinate tool arguments. Validate every tool call response with Zod (or valibot) before you act on it. Never trust the model's output shape. Here's how a createBlogPost tool schema might look:

import { z } from 'zod';
 
const createBlogPostSchema = z.object({
  title: z.string().min(1),
  description: z.string(),
  tags: z.array(z.string()).nonempty(),
  body: z.string(),
});
 
// Inside your function
const toolCall = { /*...*/ };
createBlogPostSchema.parse(toolCall.arguments);

Using zodToJsonSchema can keep the OpenAI tool definition and runtime validator in sync from a single source of truth.

Pitfall: schemas that are too loose. If the model returns an empty tags array and your code doesn't reject it, the post ships with no tags.

Pattern 3: Server-side Execution, Always

Never call the LLM from the client. Three reasons: API keys can leak, bundle size explodes, and rate limits apply per-user instead of per-server. For a TanStack Start blog, agents live in server/api/ routes or TanStack Start server functions. For Next.js, they reside in Route Handlers or Server Actions.

Here's an example of a server function:

async function handleRequest(input: CreateBlogInput) {
  const user = await authenticateUser(input.token);
  const response = await callLLM(input);
  const validatedOutput = validateResponse(response);
  await saveToDatabase(validatedOutput, user.id);
  return validatedOutput;
}

I deploy on Vercel, where server execution aligns with their serverless architecture, providing scalability and simplicity.

Pattern 4: Bounded Retries with Exponential Backoff

Production LLM calls fail. Rate limits, transient network errors, and content policy refusals on edge inputs are common. Wrap every model call in a retry helper:

async function withRetry<T>(fn: () => Promise<T>, retries = 3): Promise<T> {
  for (let attempt = 0; attempt < retries; attempt++) {
    try {
      return await fn();
    } catch (error) {
      if (isRetryable(error) && attempt < retries - 1) {
        await delay(Math.pow(2, attempt) * 1000);
      } else {
        throw error;
      }
    }
  }
}

Distinguish retryable errors (e.g., 5xx, rate limit) from non-retryable ones (e.g., validation failure, content policy refusal). The latter requires a different prompt, not a retry.

Pitfall: silent retries with no telemetry. Without logging every attempt, you'll never see your real failure rate.

Pattern 5: Observability — Log Every Call, Every Token, Every Dollar

Every LLM call should emit a structured log event with details such as agent name, model, input token count, output token count, latency, cost estimate, and a redacted preview of the prompt.

Use Vercel's logging or any structured log sink to answer questions like "what did this agent cost yesterday?" Here's a simple logging helper:

function logLLMCall(event: LLMCallEvent) {
  console.log({
    agent: event.agentName,
    model: event.model,
    tokens: { input: event.inputTokens, output: event.outputTokens },
    latency: event.latency,
    cost: event.costEstimate,
    success: event.success,
  });
}

OpenAI's usage field in the response can help map to a cost estimate per model.

Pitfall: no observability for the first month, then a $400 surprise bill. [USER FILL: optional war story if true, otherwise delete.]

What I Would Not Do

I would not reach for an agent framework (LangChain, AutoGen, CrewAI) for a single-purpose agent. The abstraction tax is too high. I would not store prompts in a database for "easy editing". Prompts belong in version control next to the code that depends on them. I would not skip output validation because "GPT-4 is reliable enough now". It is not, and the failure modes are silent.

Final Thoughts

These five patterns are not novel — they are the boring stuff that separates a demo from a production agent. The interesting work is choosing what your agent does, not how reliably it does it.

For a concrete example of all five patterns in one codebase, check out my Blogger Agent project. To read about the journey of building an AI agent that manages my blog, head over to /blog/i-built-an-ai-agent-that-manages-my-blog. For more about me, visit the about page.

George Petroff

Full-stack software engineer focused on React, TypeScript, and AI-powered tooling. Building Web3 frontends at LimeChain. Based in Sofia, Bulgaria.

GitHub LinkedIn About

5 Patterns I Use to Ship Production AI Agents in TypeScript

Why "Patterns" Beat "Frameworks" for AI Agents

Pattern 1: Anchored System Prompts

Pattern 2: Strict Tool Schemas with Zod

Pattern 3: Server-side Execution, Always

Pattern 4: Bounded Retries with Exponential Backoff

Pattern 5: Observability — Log Every Call, Every Token, Every Dollar

What I Would Not Do

Final Thoughts

Comments