What You'll Build

A ReAct agent that runs entirely on your machine: LangGraph 0.2 orchestrates the reasoning loop, Ollama serves Llama 3.3 locally, and the agent has two tools — a web search simulator and a calculator. No OpenAI key required.

Prerequisites: Python 3.11+, 8 GB RAM minimum (16 GB recommended), Ollama installed.


Step 1 — Install Ollama and Pull Llama 3.3

Download Ollama from ollama.com and install it. Then pull the model:

ollama pull llama3.3

Llama 3.3 70B runs in ~4-bit quantization on 16 GB RAM. If you have 8 GB, use llama3.2:3b instead — it supports tool calling from version 3.2 onward.

Verify Ollama is running:

curl http://localhost:11434/api/tags

You should see llama3.3 in the response.


Step 2 — Install Python Dependencies

pip install langgraph==0.2.55 langchain-ollama==0.2.3 langchain-core

LangGraph 0.2 introduced the StateGraph API that replaced the older MessageGraph. This tutorial uses the current API.


Step 3 — Define Your Tools

Tools are plain Python functions decorated with @tool:

from langchain_core.tools import tool

@tool
def calculate(expression: str) -> str:
    \"\"\"Evaluate a mathematical expression. Input: a Python math expression as string.\"\"\"
    try:
        result = eval(expression, {\"__builtins__\": {}}, {})
        return str(result)
    except Exception as e:
        return f\"Error: {e}\"

@tool
def search(query: str) -> str:
    \"\"\"Search for information. Returns a simulated result.\"\"\"
    return f\"Search result for '{query}': According to recent sources, {query} is an active area of AI research as of early 2026.\"

For production, replace search with a real tool like Tavily (pip install tavily-python) which provides a clean search API for agents.


Step 4 — Set Up the Model with Tool Binding

from langchain_ollama import ChatOllama

tools = [calculate, search]

llm = ChatOllama(
    model=\"llama3.3\",
    temperature=0,
)

llm_with_tools = llm.bind_tools(tools)

temperature=0 is important for agents — you want deterministic tool selection, not creative variation.


Step 5 — Build the LangGraph State Machine

LangGraph agents work as state machines. You define nodes (functions) and edges (transitions between nodes).

from langgraph.graph import StateGraph, END
from langgraph.prebuilt import ToolNode
from langchain_core.messages import HumanMessage, SystemMessage
from typing import TypedDict, Annotated
import operator

class AgentState(TypedDict):
    messages: Annotated[list, operator.add]

def agent_node(state: AgentState):
    messages = state[\"messages\"]
    # Add system prompt if first message
    if not any(isinstance(m, SystemMessage) for m in messages):
        messages = [
            SystemMessage(content=\"You are a helpful assistant. Use tools when needed. Think step by step.\"),
            *messages
        ]
    response = llm_with_tools.invoke(messages)
    return {\"messages\": [response]}

def should_continue(state: AgentState):
    last_message = state[\"messages\"][-1]
    if hasattr(last_message, \"tool_calls\") and last_message.tool_calls:
        return \"tools\"
    return END

tool_node = ToolNode(tools)

# Build graph
graph = StateGraph(AgentState)
graph.add_node(\"agent\", agent_node)
graph.add_node(\"tools\", tool_node)
graph.set_entry_point(\"agent\")
graph.add_conditional_edges(\"agent\", should_continue)
graph.add_edge(\"tools\", \"agent\")

app = graph.compile()

The should_continue function is the core of the ReAct loop: if the model called a tool, execute it and return to the agent. If not, end.


Step 6 — Run the Agent

result = app.invoke({
    \"messages\": [HumanMessage(content=\"What is 247 * 183? Then search for LangGraph tutorials.\")]
})

for message in result[\"messages\"]:
    print(f\"{type(message).__name__}: {message.content[:200]}\")

Expected output:

HumanMessage: What is 247 * 183? Then search for LangGraph tutorials.
AIMessage: (tool_calls: calculate, search)
ToolMessage: 45201
ToolMessage: Search result for 'LangGraph tutorials'...
AIMessage: 247 × 183 = 45,201. Regarding LangGraph tutorials...

Step 7 — Add Memory (Optional)

To persist conversation across calls:

from langgraph.checkpoint.memory import MemorySaver

checkpointer = MemorySaver()
app = graph.compile(checkpointer=checkpointer)

config = {\"configurable\": {\"thread_id\": \"user-123\"}}

# First message
app.invoke({\"messages\": [HumanMessage(content=\"My name is Alex.\")]}, config=config)

# Second message — agent remembers
result = app.invoke({\"messages\": [HumanMessage(content=\"What's my name?\")]}, config=config)

MemorySaver stores state in memory. For production, use SqliteSaver or PostgresSaver from langgraph-checkpoint-sqlite.


Common Issues and Fixes

ProblemCauseFix
Model doesn't call toolsModel too smallUse llama3.3 or llama3.2:latest, not 3b
Infinite tool loopNo loop limitAdd recursion_limit=10 to app.invoke()
Slow first responseModel loadingNormal — subsequent calls are faster
JSON parse error in toolsBad tool outputAlways return strings from tool functions

FAQ

Does Llama 3.3 support tool calling natively?

Yes. Llama 3.3 70B was trained with tool calling support. Smaller variants (3.2:3b) also support it but are less reliable at multi-step reasoning. Llama 3.2:8b is the recommended minimum for production agents.

Can I use this pattern with other local models?

Yes — any model Ollama supports that has tool calling can replace Llama 3.3. Mistral Small 3.1 (22B) and Qwen2.5-Coder:32b are strong alternatives as of early 2026.

How do I add real web search?

Replace the search tool with Tavily: pip install tavily-python, get a free API key at tavily.com, and use TavilySearchResults(max_results=3) from langchain_community.tools.

What is the difference between LangGraph and LangChain agents?

LangChain's AgentExecutor is a higher-level abstraction that hides the state machine. LangGraph exposes it directly, giving you control over retry logic, parallel tool execution, human-in-the-loop interrupts, and custom state.

Is this production-ready?

The pattern is production-ready. For real deployments, add error handling in tool functions, use PostgresSaver for persistence, and deploy with LangGraph Cloud or a self-hosted FastAPI wrapper.


Sources


Next step: Add a real web search tool to your agent. Get a free Tavily API key at tavily.com and replace the mock search function — your agent can now answer questions about current events.