Six Browser Tabs and a Saturday Morning
It's early January 2025. I'm sitting in Denver trying to plan a ski day. This should be simple. It's not.
I need to check weather conditions at the resort. I need to check whether I-70 is closed — which happens more often than anyone who doesn't live here would expect. I need to check how many routes are actually open on the mountain, because a resort being "open" doesn't mean much if only three runs are groomed. I need to check my calendar for any prior obligations. I want to make sure I've got an hour or two blocked out to play with my dogs when I get back — they've been waiting all day. After that, queue up a movie on Jellyfin and find a restaurant for dinner.
That's seven different sources. Weather APIs, CDOT road conditions, resort trail reports, my calendar, Jellyfin, restaurant availability. Each one is a separate tab, a separate search, a separate mental thread I'm holding while trying to answer one question: "Should I go skiing tomorrow, and if so, what does the rest of the day look like?"
I'd been building agentic AI workflows for a while at that point. So instead of checking all those tabs, I built a system to do it — an orchestrator that could plan across multiple data sources, run agents in parallel, resolve dependencies between steps, and give me one coherent answer.
It worked. Not in a "technically the pieces connected" way. It actually worked — the system checked conditions, reasoned about trade-offs, and planned the day better than I would have by tabbing between browser windows.
That was the moment I knew this architecture was worth building for real.
Why a Ski Trip Is an Enterprise Problem
The ski trip planner looked nothing like enterprise software. But the architecture was identical to problems I'd spent years solving at scale.
At Cigna, I worked on systems that needed to pull from multiple data sources, apply business rules, and produce auditable outputs — in a healthcare environment where compliance wasn't optional. At IBM, I saw what it looked like when enterprises tried to connect AI to their existing data infrastructure — the gap between what the models could do and what the operational plumbing could support. At Pella, I worked with manufacturing systems where data lived in ERP databases, scheduling tools, and spreadsheets that didn't talk to each other.
In every case, the core problem was the same: valuable information is scattered across multiple systems, and getting a coherent answer requires orchestrating queries across all of them, reasoning about the results together, and presenting a unified output.
The ski trip planner worked the same way. Each step wasn't just a simple lookup — the agents had to interpret ambiguous data, make judgment calls, and adapt to what they found. The weather agent didn't just fetch a temperature; it evaluated whether conditions across the day were good enough to justify the drive. The I-70 agent checked real-time road conditions and assessed whether closures or delays changed the calculus. The trail agent determined whether the number of open routes was worth the trip or if it was better to wait a day. Each agent reasoned about its domain, and the orchestrator reasoned about the results together — factoring in calendar conflicts, timing constraints, and what made sense for the full day.
None of these sources knew about each other. The value came from connecting them, letting each agent do its own reasoning, and synthesizing a coherent plan from the combined context.
Replace "resort conditions" with "vendor contracts" and "calendar conflicts" with "compliance requirements," and you have the exact architecture an enterprise needs. Multiple specialized agents, each reasoning about a different domain, sharing context through a common workspace, with a planner that understands dependencies between steps.
The orchestration pattern is the same. What changes is the data sources, the governance requirements, and the stakes.
From Personal Tool to Platform
The ski trip version had no authentication, no access controls, no audit trail, and no multi-tenancy. It ran on a server in my basement. That's fine for planning a Saturday. It's not fine for querying a production database with customer PII.
The gap between "working prototype" and "enterprise platform" is almost entirely about trust infrastructure:
Who can access what? The ski planner could query anything I gave it access to. An enterprise platform needs to enforce that the marketing team can't see HR data, that the contractor can see project tables but not salary columns, and that access decisions are made by administrators — not by the AI model.
What can agents do? My ski planner could take any action I coded into it. An enterprise platform needs write operations gated behind human confirmation, enforced in code so that prompt injection can't bypass the check.
What happened and why? I didn't need an audit trail for my ski trips. An enterprise in a regulated industry needs every agent action logged — what tool was called, what data was accessed, what the agent's reasoning was, who authorized it.
Can it run unattended? A weekend planning tool can fail silently and I'll just check the tabs myself. A workflow that runs on a schedule for a procurement team needs approval gates, timeout escalation, error handling, and crash recovery.
There was one more requirement that shaped the platform from the start: the system itself shouldn't require sending your data somewhere else. The ski planner ran on a server in my basement partly because I didn't want my calendar, my habits, and my preferences flowing through services that would retain them indefinitely. For enterprises, this is even more fundamental — regulated industries often can't send production data to third-party infrastructure, full stop. That's why Thallus is designed to be self-hosted from day one, not as a bolted-on "enterprise tier" feature, but as the default deployment model. And for organizations that prefer managed infrastructure, the SaaS deployment comes with the same guarantee: your data is never used to train models. It's your data, not ours.
Building each of these layers is what turned a ski trip planner into Thallus.
The Architecture That Survived
The core architectural decisions from that first prototype survived into the platform largely intact. Not because I planned it that way, but because the ski trip problem forced the right design choices:
Multi-source orchestration. The ski planner needed to query multiple sources and combine results. This became the planner-executor pattern — a dependency graph of steps where each step runs a specialized agent against a specific data source, and results flow through a shared board.
Parallel execution. Checking the weather and checking I-70 conditions are independent operations — there's no reason to run them sequentially. This became the executor's ability to run ready steps in parallel via asyncio.gather(), which matters a lot more when you're running 8 agents against 4 databases instead of checking 2 APIs.
Adaptive replanning. Sometimes the weather check revealed that conditions were bad, which meant the restaurant and movie plans should change too. This became the evaluation loop — after each batch of agent results, the system decides whether the plan is complete, needs additional steps, or should be restructured based on what was discovered.
Synthesis across sources. The ski planner's output wasn't six separate answers — it was one plan for the day. This became the synthesizer, which collects all agent results and citations and produces a single response that draws from every source, with every claim traceable back to where it came from.
These patterns didn't come from reading a whitepaper on agentic AI architecture. They came from needing to answer "should I go skiing tomorrow?" in a way that actually accounted for all the variables.
What the Prototype Didn't Force Me to Solve
The ski trip planner got the orchestration right. But a single-user system running on a basement server doesn't surface the problems that appear when you're building for hundreds of users across regulated industries.
A personal tool doesn't need to handle database schema drift — I controlled my own data sources. An enterprise platform needs agents that re-discover table structures before every session, because production schemas change without notice. A personal tool doesn't need retry logic with exponential backoff — if an API call failed, I'd just run it again. An enterprise platform needs workflows that handle LLM provider latency spikes gracefully, without losing state.
A personal tool doesn't need per-user cost attribution across every LLM and embedding call. It doesn't need to resolve which model provider to use at runtime because different users bring their own API keys. It doesn't need document ingestion pipelines that process a 200-page PDF asynchronously and make it searchable within minutes.
I knew how to build each of these things individually — I'd spent over a decade in data science and ML engineering. What the prototype didn't prepare me for was how many of them you need simultaneously, all working together, all production-hardened, before an enterprise will trust the system enough to connect it to their data.
The ski trip was the spark. Everything since has been building the operational and trust infrastructure that turns a clever prototype into something an enterprise would actually run in production.