Thallus vs. CrewAI: The Difference Between a Framework and a Platform

CrewAI is a powerful multi-agent framework for developers. But frameworks don't come with governance, data intelligence, or the reasoning architecture that turns agents into analysts. Here's what it takes to go from framework to production platform.

The Framework Appeal

CrewAI has earned its popularity. It's an open-source Python framework for building multi-agent systems, and it got the core abstraction right: define agents with roles and goals, organize them into crews, and let them collaborate on tasks. The developer experience for getting a prototype running is genuinely good. Define a researcher agent, a writer agent, give them tools, point them at a task, and watch them work together.

Then the demo ends. And production begins.

The VP of Finance wants to connect it to the ERP — but only if the marketing team can't see payroll data. Compliance needs an immutable audit trail of every query the agents execute. The operations team wants to build workflows without waiting for engineering to write Python. The CEO asks "can I just ask it a question?" and someone has to explain that it doesn't work that way — every question needs a developer to define a new crew first. The DBA renames a column and three crews break silently overnight.

These aren't edge cases. They're the first week of production. And they're all problems that a framework leaves for you to solve.

CrewAI gives you building blocks. Thallus gives you the building.

What CrewAI Is

CrewAI is a Python framework with two core abstractions:

Crews — teams of AI agents with defined roles, goals, and backstories. Agents collaborate through task delegation, with each agent bringing specialized capabilities. You define agents in code, assign them tools, and orchestrate their execution sequentially, in parallel, or hierarchically.

Flows — event-driven workflow orchestration that sits above crews. Flows handle state management, conditional branching, and control flow. They're the production layer that connects multiple crews into larger processes.

The architecture is sound. Role-based agent design is a proven pattern. The memory system (short-term, long-term, entity, contextual) gives agents continuity. LLM support is broad — OpenAI, Anthropic, Google, open-source models. The framework is independent of LangChain, which keeps it lighter and faster.

CrewAI also offers a hosted platform (CrewAI Cloud) where you can deploy crews without managing infrastructure, with a visual studio for configuration.

What CrewAI Isn't

CrewAI is not a complete enterprise AI platform. It's the agent orchestration layer — an important layer, but one layer of many that production deployments require.

Here's what's missing, and what you'd need to build yourself.

The Reasoning Gap

Multi-agent orchestration isn't just about having multiple agents. It's about how the system decides what to investigate, how agents share context, and what happens when the initial approach doesn't work. This is where architectural differences create the widest gap.

Static Crews vs. Dynamic Planning

CrewAI requires you to define a crew before execution: which agents, which tasks, in what order. The orchestration pattern — sequential, parallel, or hierarchical — is chosen at development time. If you anticipate that a question requires a researcher, an analyst, and a writer, you build a crew with those three agents and those three tasks.

Thallus doesn't require you to anticipate anything. The planner receives a question, evaluates the connected data sources and available agents, and creates an execution DAG at query time — steps with dependency edges, assigned to specialized agents, with independent steps parallelized automatically. The plan is generated for this specific question with this specific data landscape.

This matters because real questions don't fit neatly into predefined crew configurations. "Why did our Q4 operating costs spike 15% versus forecast?" might require querying the general ledger, the purchase order system, and the HR database, then searching contract documents and vendor performance reports. The planner identifies all of this from the question and the available data. A CrewAI developer would need to have anticipated this specific combination of data sources and built a crew for it.

Adaptive Re-Planning

The most important reasoning capability is what happens after the first pass.

In CrewAI, a crew runs its defined task sequence. Tasks can pass outputs to subsequent tasks, and agents can delegate to other agents within the crew. But the task sequence itself is fixed. If the researcher agent discovers that the original question can't be answered with the data sources the crew was built to query, the crew doesn't restructure itself to include additional sources.

In Thallus, the evaluation loop after each execution batch decides: is the plan complete? Does it need additional steps? Should it be restructured based on what the agents discovered?

If the initial plan queries the GL for cost data and the agent discovers an unexpected $2M line item from a vendor not in the usual vendor list, the evaluator can add steps to investigate that vendor — querying the procurement database, searching for the vendor's contract in the document collection, checking whether the vendor appears in the PO system. None of these steps were in the original plan. They were added because the system reasoned that the initial results required follow-up.

This adaptive re-planning is what turns agent orchestration from "run these predefined steps" into "investigate this question until you have a sufficient answer." It's the difference between automation and analysis.

Shared Context, Not Just Task Outputs

CrewAI passes task outputs between agents in a crew. Agent A completes its task, and Agent B receives the output as input. This is sequential context flow — each agent sees what the previous agent produced.

Thallus uses a shared board — a Redis-backed context space per conversation that every agent in the investigation can read from and write to. The board contains:

  • Database table schemas and join maps from connected data sources
  • Document catalog with synopsis metadata from the knowledge base
  • Cross-source entity links (mappings between document entities and database columns)
  • Agent findings and hypotheses from earlier steps

When Agent C starts its step, it doesn't just see Agent B's output. It sees the full accumulated context — every schema discovered, every document identified, every finding from every agent that has run so far. This is what enables cross-source reasoning that CrewAI's linear task output passing can't match.

A database agent discovers that vendor spend is concentrated in three categories. The document search agent, seeing this on the board, focuses its contract search on those three categories instead of searching broadly. The synthesis agent sees both the spend data and the contract terms, correlates them, and identifies that one category is 40% over its contract ceiling. This cross-pollination of context is what produces the analysis, and it requires a shared context space — not just sequential output passing.

Three Reasoning Modes

Not every question needs the same reasoning approach. CrewAI offers sequential, parallel, and hierarchical process types — orchestration patterns that determine how tasks flow between agents.

Thallus offers three reasoning modes, each with a fundamentally different execution strategy:

ASK — lightweight single-agent response for straightforward questions. No planning overhead. Fast and direct.

RESEARCH — planner-directed DAG execution. The planner creates a dependency graph of steps, the executor runs them in parallel where possible, and the evaluator decides when the investigation is complete or needs re-planning. This is the most common mode for analytical questions.

INVESTIGATE — supervisor-driven reactive multi-agent investigation. A supervisor agent directs the investigation dynamically, assigning tasks to agents based on emerging findings, pursuing multiple hypotheses simultaneously, and converging on conclusions. This is for complex, open-ended questions where the investigation path can't be planned upfront.

The system automatically detects which mode a question requires. "What's our revenue this quarter?" routes to ASK. "Compare vendor spend against contract terms across regions" routes to RESEARCH. "Why did customer churn spike last month and what should we do about it?" routes to INVESTIGATE.

CrewAI requires the developer to choose the orchestration pattern when defining the crew. Thallus chooses the reasoning strategy based on the question.

The Data Intelligence Gap

No Native Database Intelligence

CrewAI provides an NL2SQL tool that converts natural language to SQL queries. It's functional for basic database access — you pass a connection string and table name, and the agent can generate queries.

What it doesn't do:

  • Automatic schema discovery — it doesn't scan your database to discover table structures, column types, and relationships. You need to tell it what tables exist and how they relate, or build that discovery yourself.
  • Relationship inference — it doesn't automatically detect foreign keys, implicit join paths, or cross-table relationships. Complex multi-table queries require manual schema documentation or custom tooling.
  • PII detection — it doesn't scan columns for personally identifiable information when a database is connected. There's no automatic flagging of sensitive fields before agents access them.
  • Column-level access controls — there's no mechanism to make specific columns invisible to agents based on user role. If the agent has database access, it has access to everything the connection credentials allow.
  • Read-only enforcement with query validation — there's no application-level SQL validator that parses every generated query against forbidden patterns (write operations, DDL, privilege escalation) before it reaches the database.

Thallus does all of this natively. When you connect a database, the system discovers schemas automatically — tables, columns, types, foreign keys, implicit relationships, join paths. PII detection scans seven categories and flags sensitive columns for review. Column-level and table-level access controls make restricted data invisible in the schema the agent receives. Every generated SQL query passes through a validator that checks against 30+ forbidden patterns. Read-only is enforced at both the connection and application level.

The difference isn't that CrewAI can't query databases. It can. The difference is what sits between the agent and your data.

No Native Document Pipeline

CrewAI has a PGSearchTool for semantic search within PostgreSQL tables — though the documentation notes it's currently under development. RAG is supported through external integrations, but there's no built-in document processing pipeline.

To build document intelligence with CrewAI, you'd need to:

  1. Build or integrate a document processor for PDF, DOCX, XLSX, and CSV files
  2. Implement a chunking strategy
  3. Set up an embedding pipeline with a vector store
  4. Build a retrieval system with relevance scoring
  5. Implement citation tracking back to source documents
  6. Handle cross-model embedding compatibility if you change providers

Thallus provides this as a native capability. Upload a document and the platform processes it through a complete pipeline: format-specific extraction, intelligent chunking, embedding generation (compatible across OpenAI, Azure, Gemini, and Ollama), and storage in pgvector. Search uses a two-stage approach — synopsis discovery identifies relevant documents, then chunk retrieval finds specific passages — with citations tracking every claim back to its source section. This works identically in ad hoc conversations and scheduled workflows.

You can build something similar on CrewAI. You'll be building and maintaining infrastructure that Thallus ships as a core feature.

The User Access Gap

No Ad Hoc Questions

This is the most fundamental difference in user experience.

In CrewAI, every interaction requires a predefined crew. Someone — a developer — has to define agents, assign tools, write task descriptions, configure the orchestration process, and deploy the crew. Want to ask a new type of question? Define a new crew, or modify an existing one in code.

In Thallus, you type a question. The planner decomposes it, identifies data sources, assigns agents, maps dependencies, and executes. No one predefined the workflow. No one wrote Python.

"Compare our vendor spend against contract terms for renewals due next quarter" doesn't require a developer to build a vendor-analysis crew. It requires connected data sources and a text input.

This isn't a convenience feature. It's a fundamentally different capability. CrewAI answers questions that a developer anticipated and built a crew for. Thallus answers questions that nobody anticipated — because the planning happens at query time, not at development time.

Developer-Only Access

CrewAI requires Python knowledge. Defining agents, configuring tools, writing task descriptions, setting up flows — it's all code. CrewAI Studio provides a visual interface for configuration, but building production-grade crews still requires engineering resources.

This means every new capability requires developer time. Want to add a new analysis workflow? Developer task. Want to adjust how an agent queries a database? Developer task. Want to change the branching logic in a flow? Developer task.

Thallus provides three paths that don't require writing code:

Chat interface — type a question, get a cited analysis. No configuration needed beyond connecting data sources.

Natural language workflow creation — describe what you want in plain English ("Every weekday at 9 AM, research market trends for our top 3 competitors, check if any significant changes are found, and if so send a summary to the #market-intel Slack channel"). The AI generates the complete DAG with trigger, action nodes, conditions, and delivery. Review, adjust visually, activate.

Visual DAG editor — 9 node types, drag-and-drop, visual connections, side-panel configuration. Action nodes accept natural language instructions. The system auto-suggests relevant agents based on the instruction text.

Domain experts — analysts, operations managers, compliance officers — can build and modify workflows without filing a ticket with engineering.

The Governance Gap

Open Source vs. Enterprise Governance

CrewAI's open-source version has no built-in RBAC, no audit trails, no SSO, and no approval gates. These features exist only on the Enterprise plan, which requires custom negotiation.

This creates an uncomfortable gap: the version most organizations evaluate (open source) has none of the governance features that production deployment requires. The version that has governance requires a significant annual commitment.

Thallus includes governance at every level. The platform architecture — not an enterprise add-on — enforces access controls:

4-tier RBAC (Platform → Organization → Group → User) that controls which agents each user can invoke, which tools within those agents they can use, and which database tables and columns those tools can query. The marketing team uses the document search agent but not the database query agent. The finance team queries the financial database but can't see the SSN column. Each layer is enforced in application code, not by prompt instructions.

Immutable audit trails that capture every tool call, every query, every agent decision — with the agent's reasoning alongside each action. Sensitive parameters (passwords, tokens, SSNs) are automatically redacted. When an auditor asks "what data did the AI access on March 3rd?", the answer comes from a structured, tamper-evident log.

Code-level tool confirmations where write operations (send email, create ticket, delete record) require human confirmation before execution. This is enforced by prefix matching in application code, outside the model's execution loop. Prompt injection can't bypass it.

These aren't premium features. They're how the platform works.

Approval Gates

CrewAI supports human-in-the-loop through callbacks that pause execution and notify external systems. Building an approval flow requires custom code — webhook handlers, notification logic, timeout management, state persistence.

Thallus provides approval gates as a native workflow node type with:

  • Multi-approval flows — require 2 of 3 managers to approve before proceeding
  • Configurable timeouts — 2-hour window before automatic escalation
  • Full context presentation — approvers see what the workflow has done, what it wants to do next, and why, rendered from execution context with Jinja2 templates
  • Timeout escalation — if the primary approver doesn't respond, escalate to their manager
  • Audit integration — every approval, rejection, timeout, and escalation is recorded with user ID, timestamp, and comments
  • Crash recovery — approval nodes carry context snapshots, so the workflow resumes reliably even after extended approval periods

The difference: CrewAI gives you the hooks to build an approval system. Thallus gives you an approval system.

Production Readiness

Debugging and Observability

CrewAI's debugging experience has been described as "spelunking without a headlamp." Multi-agent systems are inherently complex to debug — when three agents collaborate on a task and the result is wrong, identifying which agent made which decision and why requires deep instrumentation.

CrewAI has added tracing and OpenTelemetry support, which helps. But observability is opt-in tooling you configure, not a built-in layer of the platform.

Thallus provides real-time progress streaming through the entire execution pipeline. Every plan step, every agent start and completion, every tool call — streamed to the frontend via SSE. The audit trail captures agent reasoning alongside actions. When something goes wrong, you can trace exactly what happened: which agent was assigned which step, what tools it called, what parameters it used, what results it got, and why it made the decisions it made.

Failure Handling

Multi-agent systems fail in ways that single-agent systems don't. Agent A produces output that Agent B can't use. An LLM call times out mid-crew. A tool returns unexpected data that derails the task.

CrewAI has introduced state management for pausing, resuming, and recovering flows. But failure handling within a crew — when an agent encounters unexpected data, a schema change, or an ambiguous result — depends on how well the developer anticipated the failure mode when writing the crew definition.

Thallus agents reason through unexpected situations because the agents aren't following a script — they're pursuing an objective. When a database column gets renamed, the agent discovers the current schema and generates a new query. When a document search returns insufficient results, the agent expands its search strategy. When initial results reveal that the plan needs to change, the evaluation loop restructures the execution. The system adapts because the intelligence is in the execution, not just the configuration.

At the workflow level, Thallus provides:

  • AI-evaluated success criteria — each action node defines what "success" looks like in natural language. An LLM evaluates whether the result actually meets those criteria, catching "technically completed but wrong" results that static retry logic would miss.
  • Crash recovery — workflow state persists throughout execution. Mid-workflow crashes resume from saved state.
  • Auto-disable on consecutive failures — prevents runaway scheduled workflows from hammering a broken integration.
  • Multi-level timeouts — workflow, branch, and node-level timeouts prevent any single point of failure from hanging the system.

Deployment

CrewAI's open-source version requires you to manage your own infrastructure. The hosted platform (CrewAI Cloud) provides managed deployment, but enterprise features like RBAC and SSO require the Enterprise plan with custom pricing.

Thallus deploys via Docker — docker compose up for development, with production configurations for self-hosted deployment. The same platform, the same agents, the same governance run identically whether you're on the SaaS or self-hosted. Combined with Ollama for local models, self-hosted Thallus runs fully air-gapped with zero external network calls.

The Comparison

Capability CrewAI (Open Source) CrewAI (Enterprise) Thallus
Ad hoc questions Requires predefined crews Requires predefined crews Ask anything — agents plan dynamically
Dynamic planning Fixed task sequences Fixed task sequences AI planner creates execution DAGs at query time, re-plans based on results
Cross-source reasoning Sequential task output passing Sequential task output passing Shared board context, parallel execution, cross-source synthesis with citations
Reasoning modes Developer-chosen orchestration pattern Developer-chosen orchestration pattern Auto-detected: ASK, RESEARCH, or INVESTIGATE based on the question
User interface Code-only (Studio for config) Studio + code Chat + visual DAG editor + natural language workflow creation
Database intelligence Basic NL2SQL, manual schema Basic NL2SQL, manual schema Automatic schema discovery, relationship inference, PII detection, column-level controls
Document RAG Build your own Build your own Native — upload, chunk, embed, two-stage semantic search, citations
RBAC None Role-based access, SSO 4-tier at agent, tool, and data level
Audit trails None Available Immutable with agent reasoning, tool parameters, PII redaction
Approval gates DIY via callbacks DIY via callbacks Native with multi-approval, timeouts, escalation, audit integration
Data access controls None PII detection/masking Per-table, per-column with PII detection and read-only enforcement
Workflow versioning None None Automatic snapshots, diff, restore, breaking change detection
Failure handling Developer-defined Developer-defined AI-evaluated success criteria, crash recovery, auto-disable, adaptive re-planning
Self-hosted Yes (open source) Kubernetes, VPC Yes — Docker, air-gap capable with local models
Model agnostic Yes Yes Yes + BYOK
Pricing Free (50 exec/mo) Custom From $45/mo — self-hosted Enterprise unlimited

What You're Actually Paying For

CrewAI's pricing structure deserves close examination.

The free tier gives you 50 executions per month with 1 seat. The Professional plan is $25/month for 100 executions and 2 seats, with additional executions at $0.50 each. The Enterprise plan — the one with RBAC, SSO, audit logs, and SOC 2 compliance — requires custom negotiation.

That means the governance features an enterprise needs are locked behind enterprise pricing. Everything between $25/month and Enterprise is a gap you fill with custom code and hope.

Thallus includes governance at every tier. The Starter plan ($45/month, 100 investigations) includes the same RBAC, audit trails, and tool confirmations as every other plan. The Pro plan ($149/month, 350 investigations, 3 users) adds database queries and API access. Enterprise pricing is custom but includes unlimited investigations, unlimited users, SSO, self-hosted deployment, and dedicated support.

The key pricing distinction: a Thallus "investigation" can involve multiple agents, multiple database queries, multiple document searches, and a full synthesis — and it counts as one investigation. A CrewAI "execution" is one crew run. A complex analysis that requires three specialized crews in CrewAI is three executions. In Thallus, it's one investigation.

Self-hosted Enterprise deployments remove the meter entirely. Unlimited investigations, unlimited users, no per-execution charges.

When You Need What

Use CrewAI when:

  • You have a Python engineering team that wants to build custom multi-agent systems from scratch
  • The use case is well-defined and crews can be predefined in advance
  • You're building a product that embeds AI agents and need framework-level control
  • Enterprise governance isn't a requirement (or you have the budget for CrewAI Enterprise)
  • Your team will build and maintain the data layer, document pipeline, and governance infrastructure themselves

Use Thallus when:

  • Your team needs to ask ad hoc questions across multiple data sources without predefined workflows
  • You need reasoning that adapts — dynamic planning, re-planning based on results, cross-source analysis through shared context
  • Non-technical users need to build and run AI agent workflows
  • Enterprise governance is non-negotiable — RBAC, audit trails, approval gates, data-level access controls
  • You need native database intelligence — schema discovery, PII detection, column-level controls
  • You need native document search with semantic understanding and citations
  • Self-hosting and data residency matter
  • You want workflows that reason through the unexpected instead of failing when reality doesn't match the script
  • Workflow versioning, crash recovery, and operational resilience are requirements
  • You don't want to build and maintain the infrastructure that sits between agents and production data

The Bottom Line

CrewAI is a good framework. It provides solid abstractions for multi-agent orchestration, and the open-source model gives developers transparency and control. For engineering teams that want to build custom AI agent systems from the ground up, it's a legitimate foundation.

But a framework is a starting point, not a destination. Between "agents can collaborate on tasks" and "the organization trusts this system with production data" sits an enormous amount of infrastructure: data governance, document intelligence, audit trails, approval gates, user interfaces, operational resilience, and the reasoning architecture that turns predefined task sequences into adaptive, cross-source investigations.

You can build all of that on top of CrewAI. Many organizations try. The question is whether your engineering team's time is better spent building AI agent infrastructure — or using a platform that already has it, and focusing their effort on the problems that are actually unique to your business.

Chris Mertin Founder

Building Thallus to help teams get real work done with governed AI agents — no vendor lock-in, no black boxes.