Comparison

Thallus vs. OpenAI Agents SDK vs. LangGraph: Two Frameworks, Same Gap

OpenAI's Agents SDK is minimalist and fast to learn. LangGraph is powerful and endlessly configurable. Neither ships with data intelligence, document search, governance, or a way for non-developers to use them. Here's what sits between a framework and a platform.

Chris Mertin · Founder

March 27, 2026 · 20 min read

Two Philosophies, One Gap

The AI agent framework landscape has split into two camps.

Camp one: minimalism. OpenAI's Agents SDK gives you three primitives — agents, handoffs, and guardrails — and gets out of your way. Define an agent, give it tools, run it. The learning curve is measured in hours. If you're already in the OpenAI ecosystem, it's the fastest path from idea to working prototype.

Camp two: maximum control. LangGraph gives you a graph-based state machine where you define nodes, edges, conditional routing, checkpointing, and parallelization. The learning curve is steep — graph theory, state management, distributed systems — but the control is granular. Every transition, every state mutation, every branching decision is explicitly defined.

Both are legitimate engineering choices. Both produce working multi-agent systems. And both leave the same enormous gap between "agents that work in development" and "agents that an organization trusts with production data."

That gap is governance, data intelligence, document understanding, operational resilience, and a way for anyone who isn't a Python developer to actually use the system.

OpenAI Agents SDK: What It Gets Right

The Agents SDK earned its adoption by being genuinely easy to use. Three primitives:

Agents — LLMs with instructions and tools. Define what the agent does, what tools it has, and let the built-in agent loop handle tool invocation, result handling, and iteration until the task is complete.

Handoffs — agents delegating to other agents. Agent A decides it needs a specialist and hands off to Agent B with context. This creates multi-agent workflows without a separate orchestration layer.

Guardrails — input and output validation. Check that an agent's inputs are safe and its outputs meet constraints before they're returned.

The SDK also provides built-in tracing for debugging, session persistence across agent runs (with SQLAlchemy, SQLite, Redis, or encrypted backends), and MCP server support for tool integration. Any Python function becomes a tool with automatic schema generation and Pydantic validation.

For prototyping, it's hard to beat. An experienced developer can have a multi-agent system running in an afternoon.

LangGraph: What It Gets Right

LangGraph earned its adoption by giving developers absolute control over execution flow. The architecture:

Nodes — computational steps. Each node is a function that receives state, performs work (LLM calls, tool use, data processing), and returns updated state.

Edges — transitions between nodes. Edges can be conditional, routing execution down different paths based on state values.

State — a shared object that persists through execution. Every node reads from and writes to the same state, providing continuity across the graph.

Checkpointing — persistent snapshots of execution state. If something fails, execution can resume from the last checkpoint rather than starting over.

LangGraph supports parallelization, streaming, human-in-the-loop interrupts, and memory stores that persist across sessions. It's model-agnostic — use any provider. Companies like Klarna, Elastic, and Uber use it in production.

For developers who need fine-grained control over every aspect of agent execution, LangGraph delivers.

Here's what neither ships with.

No Data Intelligence

Neither framework knows anything about your databases.

The OpenAI Agents SDK lets you write a Python function that queries a database and register it as a tool. LangGraph lets you create a node that executes a SQL query. In both cases, the developer writes the query logic, manages the connection, and handles the schema.

Neither provides:

Automatic schema discovery — scanning tables, columns, types, relationships, and join paths when a database is connected
Relationship inference — detecting foreign keys and implicit joins across tables so agents can generate multi-table queries without manual configuration
PII detection — scanning columns for personally identifiable information and flagging sensitive fields before agents access them
Column-level access controls — making specific columns invisible to agents based on user role, enforced in application code
Read-only enforcement with query validation — parsing every agent-generated SQL query against 30+ forbidden patterns (write operations, DDL, privilege escalation) before it reaches the database

Thallus provides all of this natively. Connect a database and the system discovers its structure automatically. PII detection scans seven categories at connection time. Column-level and table-level controls remove restricted data from the schema agents receive — the model never sees fields it shouldn't query. Every SQL query passes through a validator before execution. Read-only is enforced at both the connection and application level.

The frameworks give you a way to call a database. Thallus gives you a governed data intelligence layer.

No Document Pipeline

Neither framework has a built-in document processing pipeline.

You can build RAG with either — the OpenAI ecosystem has vector stores, and LangGraph integrates with any retrieval system. But building means:

Choosing and integrating a document processor for PDF, DOCX, XLSX, CSV
Implementing chunking strategy
Setting up embeddings with a vector store
Building retrieval with relevance scoring
Implementing citation tracking back to source documents
Handling cross-model embedding compatibility when you switch providers

Thallus ships this as a core capability. Upload a document and the platform processes it: format-specific extraction, intelligent chunking, embedding generation (cross-compatible across OpenAI, Azure, Gemini, and Ollama), pgvector storage, and two-stage semantic search — synopsis discovery to identify relevant documents, then chunk retrieval for specific passages. Every claim in the synthesis traces back to its source section with citations.

This works identically in ad hoc conversations and scheduled workflows. No integration project. No vector store management. No custom retrieval logic.

No Enterprise Governance

Neither framework includes RBAC, audit trails, or approval gates.

OpenAI Agents SDK provides guardrails for input/output validation — but guardrails are developer-defined validation functions, not enterprise access controls. There's no concept of "this user's agents can query the customer table but not the salary column." There's no immutable audit trail capturing every agent action with reasoning. There's no approval gate that physically halts execution until a human authorizes.

LangGraph provides human-in-the-loop interrupts — you can pause execution at a node and wait for human input. But building an enterprise approval system on top of interrupts requires custom code: notification routing, timeout management, escalation logic, multi-approver flows, context presentation, and audit integration. LangGraph gives you the primitive. The enterprise system is your problem.

Thallus includes governance at every tier:

4-tier RBAC (Platform → Organization → Group → User) controls which agents each user invokes, which tools those agents use, and which database tables and columns those tools query. Enforced in application code, not prompt instructions.

Immutable audit trails capture every tool call, every query, every agent decision — with reasoning alongside each action. Sensitive parameters are automatically redacted. When an auditor asks what the AI accessed, the answer comes from a structured, tamper-evident log.

Approval gates as a native workflow node with multi-approval flows, configurable timeouts, escalation paths, full context presentation, and audit integration. The workflow physically cannot proceed without human authorization.

Code-level tool confirmations where write operations require human confirmation, enforced by prefix matching outside the model's execution loop. Prompt injection can't bypass it.

No Ad Hoc Questions

Both frameworks require a developer to define the agent system before anyone can use it.

With the OpenAI Agents SDK, someone writes agent definitions, tool registrations, and handoff logic in Python. With LangGraph, someone defines nodes, edges, state schema, and conditional routing in Python. In both cases, the questions the system can answer are limited to what the developer built it to handle.

Want to ask a new type of question? In the OpenAI SDK, define new agents with new tools and new handoff logic. In LangGraph, add new nodes, new edges, potentially restructure the state schema — which, as developers have noted, "often requires restructuring the entire state schema, increasing the risk of errors."

In Thallus, you type a question. The planner decomposes it into an execution DAG at query time, assigns specialized agents, maps dependencies, and executes — parallelizing independent steps, sharing context through a shared board, and re-planning when results change the approach. No predefined graph. No hardcoded node transitions. No developer in the loop for a new question.

No Non-Developer Access

Both frameworks are Python libraries. Every interaction requires code.

This means every new workflow, every new analysis, every adjustment to agent behavior requires engineering time. The finance team can't build their own quarterly review workflow. The compliance team can't set up a contract renewal monitor.

Thallus provides:

Chat interface — type a question, get a cited analysis. No code.

Natural language workflow creation — describe the workflow in English, the AI generates the complete DAG with triggers, actions, conditions, approvals, and delivery. Review visually, adjust, activate.

Visual DAG editor — 9 node types, drag-and-drop, side-panel configuration with natural language instructions. Agent auto-suggestion based on task description.

Domain experts build and modify workflows without writing Python.

The Context Architecture Problem

This is a technical difference that matters more than most people realize.

How the OpenAI Agents SDK Handles Context

In the OpenAI Agents SDK, everything flows through the LLM's context window. When an agent calls a tool, the result is appended to the conversation messages and sent back to the model. When Agent A hands off to Agent B, Agent B receives the entire conversation history — every message, every tool call, every tool result, every previous agent's output — as tokens in the context window.

The SDK's documentation is explicit: "it's as though the new agent takes over the conversation, and gets to see the entire previous conversation history."

There's an opt-in beta feature (nest_handoff_history) that collapses prior transcript into a summary. But it's still in the context window — just compressed.

This creates a compounding problem. Each agent adds its tool calls and results to the message history. As agents hand off to other agents, the context grows. By the third or fourth handoff, the model is processing thousands of tokens of accumulated tool results, intermediate reasoning, and previous agent outputs — much of which may be irrelevant to the current agent's task but still consumes context capacity and attention.

This is context rot. The signal-to-noise ratio in the context window degrades with every step. The model has to attend to the SQL query results from Agent A's database call while trying to focus on Agent C's document analysis. Irrelevant context doesn't just waste tokens — it actively degrades reasoning quality. Research consistently shows that LLMs perform worse on tasks when the context contains large amounts of irrelevant information. More noise means more hallucination.

LangGraph's shared state object has the same fundamental constraint. The state accumulates data from every node, and when a node invokes the LLM, the relevant portions of state are serialized into the context. The developer has more control over what gets included, but the architecture still routes everything through the model's context window.

How Thallus Handles Context

Thallus separates the data layer from the context layer.

Agent results, database schemas, document catalogs, cross-source entity links, and findings are stored on a Redis-backed board — a shared context space that exists outside any agent's LLM context window. The board is structured data, not conversation history.

When an agent starts its step, the orchestrator injects only the relevant board data into that agent's prompt — structured, tagged, and scoped to what that specific agent needs for its specific task. The database query agent gets the relevant schema information. The document search agent gets the document catalog. The synthesis agent gets the findings from previous steps. Each agent receives a clean, focused context — not the entire accumulated history of every previous agent's tool calls.

This means:

No context rot. Agent C's context doesn't contain Agent A's raw SQL results. It contains the structured finding that Agent A posted to the board. The signal stays strong because irrelevant intermediate data never enters the context window.

Better reasoning quality. Each agent works with a focused context tailored to its task, not a growing message history full of other agents' tool invocations. Less noise means more accurate analysis and fewer hallucinations.

No context window pressure. A 10-step investigation with 6 database queries and 4 document searches doesn't accumulate 10 steps worth of raw tool results in a single context window. Each step gets a fresh context with board data, not a conversation that's been growing since step 1.

Data integrity. The board is persistent, structured, and immutable once written. It doesn't get summarized, truncated, or compressed to fit a context window. The data that Agent A discovered is exactly the data that Agent D reads — not a model's paraphrase of it.

This isn't an optimization. It's an architectural decision that directly affects the quality of multi-agent reasoning. When your investigation involves querying three databases and searching a document collection, the synthesis agent should work from structured findings, not from a 50,000-token conversation history where it has to excavate the relevant data from tool call logs.

Where They Differ From Each Other

The two frameworks have different strengths and different weaknesses. Understanding both matters, because the choice between them is real for organizations that decide to build their own agent infrastructure.

OpenAI Agents SDK: Simplicity at the Cost of Lock-In

The SDK's simplicity is its strength and its constraint. Three primitives. Minimal configuration. Fast to learn.

The cost is flexibility. The SDK is designed around OpenAI's ecosystem. It supports non-OpenAI models through LiteLLM integration, but the architecture, the tracing, and the optimizations are built for OpenAI's models. When OpenAI changes pricing, adjusts rate limits, or deprecates a model, your agent infrastructure absorbs the impact.

Multi-agent orchestration happens through handoffs — Agent A delegates to Agent B. This is intuitive for sequential workflows but doesn't provide the graph-level control that complex orchestration sometimes requires. There's no built-in parallel execution of independent agent branches. There's no shared state object that all agents read from simultaneously.

The SDK also doesn't include a workflow engine. There are no scheduled triggers, no approval gates, no delivery nodes, no versioning. It's an agent framework, not a workflow system.

LangGraph: Control at the Cost of Complexity

LangGraph's control is its strength and its burden. You define every node, every edge, every state transition. Nothing happens that you didn't explicitly specify.

The cost is complexity. Developers have described the learning curve as requiring "a strong grasp of graph theory, state machines, and distributed systems architecture." When workflows fail, "pinpointing the root cause can be far more challenging than troubleshooting traditional linear code." As graphs grow, "execution slows, memory usage increases, and debugging becomes more difficult."

Production deployment adds another layer. LangGraph itself is free and open source. But production observability requires LangSmith — the Plus plan is $39/user/month, and Enterprise (with SSO, audit logging, self-hosted deployment) requires custom pricing. The framework is free. The production infrastructure isn't.

The graph-based architecture also creates rigidity in a different way than the OpenAI SDK's simplicity. Adding a new agent intent or updating workflow logic "often requires restructuring the entire state schema, which increases the risk of errors and adds significant time to the modification process." The very control that makes LangGraph powerful makes it expensive to change.

The Reasoning Architecture Difference

Beyond individual features, the reasoning architecture is fundamentally different.

Predefined Execution vs. Dynamic Planning

Both frameworks require the developer to define the execution path in advance. In the OpenAI SDK, that's agent definitions and handoff logic. In LangGraph, that's nodes, edges, and conditional routing.

Thallus plans at query time. The planner receives a question, evaluates connected data sources and available agents, and creates an execution DAG dynamically — steps with dependency edges, assigned to specialized agents, with independent steps parallelized automatically.

This determines what the system can answer. A predefined graph can only handle questions it was built for. A dynamic planner handles questions nobody anticipated, because the plan is generated from the question and the data landscape, not from a developer's foresight.

Fixed Orchestration vs. Adaptive Re-Planning

In both frameworks, the execution path is fixed once defined. The OpenAI SDK runs agents through handoffs. LangGraph traverses the graph. Neither restructures its approach based on what it discovers mid-execution.

Thallus re-plans. After each execution batch, the evaluation loop decides: is the plan complete? Does it need additional steps? Should it be restructured?

If the initial plan queries the GL and discovers an unexpected vendor charge, the evaluator adds steps to investigate that vendor — querying the procurement database, searching for the contract, checking the PO system. None of these steps were in the original plan. They emerged from the investigation. This is what turns orchestration into analysis.

One Mode vs. Three

Both frameworks use whatever orchestration pattern the developer builds.

Thallus auto-detects the appropriate reasoning mode:

ASK — single-agent response for straightforward questions. Fast, no planning overhead.

RESEARCH — planner-directed DAG execution with parallel steps, dependency management, and evaluation loops. For analytical questions that span data sources.

INVESTIGATE — supervisor-driven reactive investigation that pursues multiple hypotheses based on emerging findings. For complex, open-ended questions where the path can't be planned upfront.

The system matches the strategy to the question. The user doesn't choose. The system detects.

The Comparison

Capability	OpenAI Agents SDK	LangGraph	Thallus
Ad hoc questions	Requires predefined agents	Requires predefined graph	Ask anything — dynamic planning at query time
Dynamic planning	No — handoff chains in code	No — graph defined in code	AI planner creates DAGs, re-plans based on results
Context architecture	Full history in LLM context window	State object serialized to context	Separate board — structured data injected per-agent, no context rot
Cross-source reasoning	Sequential handoff context	Shared state within defined graph	Shared board + parallel execution + cross-source synthesis with citations
Reasoning modes	Developer-defined	Developer-defined	Auto-detected: ASK, RESEARCH, INVESTIGATE
Learning curve	Low — 3 primitives	High — graph theory, state machines	Low — chat interface, natural language workflows, visual editor
User interface	Python only	Python only	Chat + visual DAG editor + natural language workflow creation
Database intelligence	Build your own tools	Build your own nodes	Native schema discovery, relationship inference, PII detection, column-level controls
Document RAG	Build with OpenAI vector stores	Build with any retrieval system	Native pipeline — upload, chunk, embed, two-stage semantic search, citations
Model flexibility	OpenAI-primary (others via LiteLLM)	Model-agnostic	Model-agnostic + BYOK + local models via Ollama
RBAC	None	None (Enterprise LangSmith for audit)	4-tier at agent, tool, and data level
Audit trails	Built-in tracing (debugging)	Via LangSmith (paid)	Immutable with agent reasoning, tool parameters, PII redaction
Approval gates	None	Human-in-the-loop interrupts (DIY)	Native with multi-approval, timeouts, escalation, audit integration
Data access controls	None	None	Per-table, per-column with PII detection and read-only enforcement
Workflow engine	None	Graph execution (no triggers, delivery, versioning)	Full DAG workflows — triggers, scheduling, delivery, versioning, crash recovery
Self-hosted	Library (no platform)	Library + LangSmith for production	Full platform via Docker, air-gap capable
Pricing	Free (open source)	Free (LangSmith Plus $39/user/mo, Enterprise custom)	From $45/mo — self-hosted Enterprise unlimited

The Cost of "Free"

Both frameworks are free and open source. This is genuinely valuable — no licensing cost, no vendor negotiation, full source code transparency.

But "free framework" and "free production system" are different things.

To get from the OpenAI Agents SDK to a production AI agent platform, you need to build: database connections with schema discovery, a document processing pipeline, RBAC at the data level, audit trails, approval workflows, a user interface for non-developers, workflow scheduling and delivery, crash recovery, versioning, operational monitoring, and a context architecture that doesn't degrade as investigations scale. Each of these is a significant engineering project. Together, they represent months of development and ongoing maintenance.

To get from LangGraph to production, you need all of the above plus LangSmith for observability. LangSmith's Plus plan is $39/user/month. The Enterprise plan — with SSO, audit logging, and self-hosted deployment — requires custom pricing. The framework is free. The infrastructure to run it in production isn't.

Thallus ships all of it as a platform. Governance at every tier. Database intelligence. Document search. Visual workflows. Audit trails. Crash recovery. Versioning. Chat interface. A context architecture designed for multi-agent reasoning. Starting at $45/month for the SaaS, or unlimited on self-hosted Enterprise.

The question isn't whether the framework is free. It's whether building the production infrastructure around it is cheaper than using a platform that already has it.

When You Need What

Use the OpenAI Agents SDK when:

You're prototyping quickly and want the fastest path to a working agent
You're already committed to the OpenAI ecosystem
The use case is narrow and well-defined
You have developers who will build and maintain the surrounding infrastructure
Enterprise governance isn't a requirement

Use LangGraph when:

You need granular control over every state transition and execution path
Your team has the graph theory and distributed systems expertise
The use case requires complex, precisely defined orchestration patterns
You're willing to invest in LangSmith for production observability
You'll build the governance, data, and user layers yourself

Use Thallus when:

Users need to ask ad hoc questions across multiple data sources without predefined graphs or agents
You need reasoning that adapts — dynamic planning, re-planning, cross-source analysis through shared context
Context quality matters — investigations should scale without degrading the signal-to-noise ratio in agent context windows
Non-technical users need to build and run AI agent workflows
Enterprise governance is non-negotiable — RBAC, audit trails, approval gates, data-level access controls
You need native database intelligence — schema discovery, PII detection, column-level controls, read-only enforcement
You need native document search with semantic understanding and citations
Model flexibility matters — multiple providers, BYOK, local models
Self-hosting and data residency are requirements
Workflow versioning, crash recovery, and operational resilience are requirements
You'd rather use a platform than build one

The Bottom Line

The OpenAI Agents SDK and LangGraph represent two valid approaches to multi-agent orchestration. One optimizes for simplicity. The other optimizes for control. Both produce capable agent systems in the hands of skilled developers.

But they're both starting points. Between "agents can use tools and delegate tasks" and "the organization trusts this system with production data, non-developers can use it, and compliance can audit it" sits the same gap — regardless of which framework you start from.

Data intelligence. Document search. Enterprise governance. User interfaces. Operational resilience. Dynamic planning. Adaptive reasoning. A context architecture that preserves signal quality as investigations scale. These aren't features you bolt on. They're the architecture that makes AI agents useful to an organization, not just to the developer who built them.

You can build that architecture on either framework. The question is whether that's the best use of your engineering team's time — or whether they'd deliver more value working on problems unique to your business, on a platform that already handles the infrastructure between agents and production data.

Chris Mertin Founder

Building Thallus to help teams get real work done with governed AI agents — no vendor lock-in, no black boxes.