Morning Edition LIVE
Vol. I · No. 1
Est.
MMXXVI

The A.I. Beat

Dispatches from the frontier of machine intelligence
Three
Dollars
← Front page Ai Agents May 2, 2026 · 5 min read
Ai Agents

Building Your First AI Agent: A Practical Guide

A step-by-step introduction to building AI agents -- from simple tool-using chatbots to autonomous multi-step systems.
Building Your First AI Agent: A Practical Guide

Most agent tutorials make a simple idea look complicated. They introduce five abstractions before showing you a working example. Here is the alternative: we start with what an agent actually does at the API level, build up from there, and address the real failure modes that tutorials leave out.

By the end of this piece, you will understand every layer of a working agent, know what it costs to run one, and have a clear picture of what goes wrong and how to fix it.

How Tool Use Actually Works

Every major LLM provider — Anthropic, OpenAI, Google — now supports a protocol where the model can request that your code execute a function. The flow is mechanical and predictable. There is no magic.

Here is what happens at each step, concretely:

  1. Your code sends the user’s message plus a list of available tools to the API.
  2. The model returns a response that, instead of containing text, contains a tool_use block specifying which tool to call and with what arguments (as JSON).
  3. Your code parses the tool name and arguments, executes the corresponding function, and sends the result back as a tool_result message.
  4. The model reads the result and either makes another tool call or produces a final text response.

The agent loop is literally steps 2-4 on repeat. When the model stops requesting tools and returns plain text, the loop ends.

What You Need to Build One

The requirements are more modest than the framework ecosystem suggests. Here is the full stack, from bottom to top:

That is it. No vector database, no LangChain, no Kubernetes cluster. You can build a useful agent in under 200 lines of Python or TypeScript.

The Core Loop in Pseudocode

Here is the complete agent loop. This is not simplified for illustration — production agents at major companies run a version of this exact pattern:

def run_agent(user_message, tools, system_prompt, max_iterations=25):
    messages = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_message}
    ]

    for i in range(max_iterations):
        response = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=4096,
            tools=tools,
            messages=messages
        )

        # If the model wants to use a tool, execute it
        if response.stop_reason == "tool_use":
            tool_block = response.content[-1]  # the tool_use block
            tool_name = tool_block.name
            tool_input = tool_block.input

            # Execute the tool (your code)
            result = execute_tool(tool_name, tool_input)

            # Append the assistant's response and the tool result
            messages.append({"role": "assistant", "content": response.content})
            messages.append({
                "role": "user",
                "content": [{"type": "tool_result",
                             "tool_use_id": tool_block.id,
                             "content": str(result)}]
            })
        else:
            # Model produced a final text response -- we're done
            return response.content[0].text

    return "Agent hit maximum iteration limit."

The execute_tool function is a simple dispatcher: look up the tool name in a dictionary, call the corresponding Python function with the provided arguments, return the result. That is the entire agent.

Defining Tools: The JSON Schema

The model decides which tool to call based on the tool’s name, description, and parameter schema. This is where most agent bugs originate — a vague description leads to wrong tool selection.

Here is a concrete tool definition for a file-reading tool, in the format Anthropic’s API expects:

{
  "name": "read_file",
  "description": "Read the contents of a file at the given path. Returns the full file contents as a string. Use this when you need to examine source code, configuration files, or any text file. Do NOT use this for binary files like images or PDFs.",
  "input_schema": {
    "type": "object",
    "properties": {
      "file_path": {
        "type": "string",
        "description": "Absolute path to the file, e.g. /home/user/project/src/main.py"
      }
    },
    "required": ["file_path"]
  }
}

And here is a tool for running shell commands:

{
  "name": "run_command",
  "description": "Execute a shell command and return its stdout and stderr. Use for running tests, installing packages, checking file existence, or any operation that requires shell access. Commands run in /home/user/project by default. Timeout: 30 seconds.",
  "input_schema": {
    "type": "object",
    "properties": {
      "command": {
        "type": "string",
        "description": "The shell command to execute, e.g. 'python -m pytest tests/ -v'"
      },
      "working_directory": {
        "type": "string",
        "description": "Optional working directory for the command"
      }
    },
    "required": ["command"]
  }
}

Notice the descriptions are specific and include examples. The description for read_file explicitly says not to use it for binary files. The run_command description specifies the timeout. These details matter because the model reads them to decide when and how to use each tool.

A concrete rule of thumb: write tool descriptions as if you were explaining the tool to a competent but new contractor who has never seen your codebase. If they would be confused, so will the model.

The System Prompt: Shaping Agent Behavior

The system prompt is your primary lever for controlling agent behavior. Here is a template that works well in practice:

You are a coding assistant that can read files, write files, and run
shell commands in the user's project.

WORKFLOW:
1. Understand the user's request fully before acting
2. Read relevant files to understand context
3. Make changes incrementally -- one logical change at a time
4. After making changes, run tests to verify correctness
5. If tests fail, read the error, fix the issue, and re-run

CONSTRAINTS:
- Never modify files outside the project directory
- Always run existing tests after making changes
- If you are unsure about a change, explain your reasoning and ask
- Maximum 3 retries on any single failing test before asking for help

STYLE:
- Prefer minimal diffs over rewriting entire files
- Add comments only when the code is non-obvious
- Follow the existing code style in the project

Each section serves a purpose. WORKFLOW prevents the model from jumping straight to code generation without reading context first. CONSTRAINTS establish safety guardrails. STYLE prevents over-engineering.

Common Failure Modes and Concrete Fixes

This is where most tutorials fail you. They show the happy path and stop. Here are the five failure modes you will actually encounter, with specific solutions.

Infinite Loop Fix — Concrete Implementation

def detect_loop(messages, lookback=6):
    """Check if the last N tool calls are repetitive."""
    recent_calls = []
    for msg in messages[-lookback * 2:]:  # tool calls are pairs
        if hasattr(msg, 'tool_use'):
            recent_calls.append((msg.tool_use.name, str(msg.tool_use.input)))

    if len(recent_calls) >= 3:
        if len(set(recent_calls[-3:])) == 1:  # last 3 calls identical
            return True
    return False

When this returns True, inject a user message saying: “You appear to be in a loop. Summarize what you have tried so far and propose a different approach.” This works because it forces the model to step back and reason rather than continuing to execute.

Context Overflow Fix — Practical Strategy

Claude 4 Sonnet has a 200K token context window. At approximately 0.75 words per token, that is about 150,000 words. It sounds enormous, but agent sessions fill it faster than you expect. Each tool call and result adds 500-2,000 tokens. After 40 iterations, you have consumed 20,000-80,000 tokens of context just on tool interactions.

The practical solution: when total token usage exceeds 60-70% of the window, summarize the oldest tool results. Replace the full file contents with “Previously read src/main.py (450 lines, Python FastAPI application, key endpoints: /users, /orders, /health).” You lose detail but preserve the agent’s ability to reason about what it has done.

What This Costs

Agent costs are predictable once you understand the token math. Here are real numbers for common scenarios using Claude 4 Sonnet (input: $3/M tokens, output: $15/M tokens):

Simple question with 2-3 tool calls: ~5K input tokens, ~2K output tokens. Cost: ~$0.045.

Moderate coding task (10-15 tool calls, reading/writing files, running tests): ~30K input tokens, ~8K output tokens. Cost: ~$0.21.

Complex refactoring (30-50 tool calls, multi-file changes, multiple test runs): ~100K input tokens, ~25K output tokens. Cost: ~$0.68.

Extended debugging session (50+ tool calls, large context): ~200K input tokens, ~40K output tokens. Cost: ~$1.20.

For Claude 4 Opus (input: $15/M, output: $75/M), multiply these costs by approximately 5x. This is why most production agents use Sonnet for the routine work and reserve Opus for the hardest reasoning steps.

Prompt caching dramatically changes this math. When you enable caching for your system prompt and tool definitions (which are identical on every API call), Anthropic charges only 10% of the normal input rate for cached tokens. A typical agent with a 3,000-token system prompt and 2,000 tokens of tool definitions saves $0.01-0.05 per session — which adds up to thousands of dollars per month at scale.

Building Up: From Single Agent to Multi-Agent

Once you have a working single-agent loop, the natural next step is specialization. Instead of one agent with 20 tools, you build multiple agents with 3-5 tools each, and an orchestrator that routes tasks to the right specialist.

The pattern:

agents = {
    "researcher": Agent(tools=[web_search, read_url, summarize],
                        system_prompt="You find and synthesize information..."),
    "coder": Agent(tools=[read_file, write_file, run_command],
                   system_prompt="You implement code changes..."),
    "reviewer": Agent(tools=[read_file, run_command, lint],
                      system_prompt="You review code for bugs and style...")
}

def orchestrate(task):
    plan = planner_llm.create_plan(task)
    for step in plan:
        agent = agents[step.agent_name]
        result = agent.run(step.instruction)
        # Pass result to next step

This is essentially what CrewAI and Anthropic’s multi-agent patterns implement. The advantage: each agent’s system prompt and tool set is focused, which measurably improves tool selection accuracy. The disadvantage: inter-agent communication adds latency and token cost, and debugging gets harder because you have to trace through multiple agent sessions.

Our recommendation for your first agent: do not start here. Build a single agent with 3-5 tools. Get it working reliably. Observe where it struggles. Only then consider multi-agent patterns for the specific tasks where a single agent falls short.

A Complete Starter Project

Here is a concrete project you can build this weekend to learn agent development:

Goal: An agent that can answer questions about a local codebase by reading files, searching for patterns, and running tests.

Tools needed (3):

  1. read_file — read a file’s contents
  2. search_files — run grep -r and return matching lines
  3. run_command — execute shell commands (restricted to the project directory)

System prompt: Tell the agent it is a code assistant, should always read files before making claims about them, and should verify answers by searching for evidence.

Test it with these prompts (in order of difficulty):

  1. “What language is this project written in?” (requires listing files)
  2. “How does the authentication middleware work?” (requires reading multiple files)
  3. “Are there any functions longer than 50 lines?” (requires searching and counting)
  4. “Find and fix any TODO comments that mention a bug.” (requires search, read, and write)

If your agent can handle the first three reliably, you have a working system. The fourth is the real test — it requires the full observe-think-act loop with multiple iterations.

The Frameworks: When and Why

You do not need a framework to build an agent. The pseudocode above is complete. But frameworks provide value when you need:

  • LangChain / LangGraph: Stateful workflows with branching logic, retries, and persistence. Good when your agent needs to pause and resume, or when you need complex control flow beyond a simple loop.
  • CrewAI: Multi-agent orchestration with role-based agents. Good when your task naturally decomposes into specialized roles.
  • Anthropic SDK (direct): Maximum control, minimum abstraction. Good when you want to understand exactly what is happening and your use case fits the basic loop.

The honest assessment: for 80% of agent use cases, the direct SDK approach in under 200 lines of code will outperform a framework-heavy approach, because you can debug it, you understand every line, and you are not fighting abstractions designed for someone else’s use case.

The Bottom Line

An AI agent is a while loop, a set of tools, and a language model that decides when to use them. The concept is simple. The engineering challenge is not building the loop — it is handling the failure modes gracefully, managing token costs, and writing tool descriptions clear enough that the model consistently makes the right choices.

Start with three tools, a clear system prompt, and the basic loop. Run it ten times. Watch where it breaks. Fix those specific problems. That is the entire methodology, and it works better than any framework tutorial.

ai agents coding developer tools