Authors: Ohad Elhelo, Ori Cohen, Co-Founders
Two different things are emerging under the name “agent.”
Open-ended agents work for users. Coding assistants, computer-use agents, personal AI. You’re the principal. If the agent interprets your intent slightly differently each time, that’s fine. You’re in the loop. You’ll correct it. Flexibility is the point.
Task-oriented agents work on behalf of entities. An airline’s booking agent. A bank’s support agent. An insurer’s claims agent. These agents serve users, but they represent the entity. The entity is the principal, the one whose policies must be enforced.
Task-oriented agents need constraints: the certainty that specific behaviors will never occur without the right conditions. The ticket won’t be cancelled unless the passenger is Business Class and Platinum Elite. The payment won’t process without explicit confirmation. The refund won’t be issued if required documentation is missing.
These aren’t preferences. They’re requirements that determine whether AI can be trusted with interactions involving real money, real appointments, and real consequences.
But task-oriented agents also need flexibility. Users don’t follow scripts. They ask unexpected questions, change their mind, go off on tangents. The agent has to handle real conversation while enforcing real constraints.
That combination is the hard problem.
The opportunity is enormous. Every conversation that results in real-world action—booking flights, processing payments, managing claims, executing trades—could be automated. These interactions run the economy. The market for task-oriented agents dwarfs what open-ended assistants will ever capture.
But without control, task-oriented agents can’t be deployed reliably at scale. Enterprises won’t trust AI with customer interactions when “usually works” is the best assurance available. This is why, despite three years and billions of dollars, high-stakes enterprise deployments are hard to come by.
High-stakes use cases have a non-negotiable requirement: they need both control and flexibility. Control because constraints must hold absolutely; flexibility because real users don’t follow scripts.
The industry has converged on two main approaches: orchestration frameworks and function-calling LLM agents. Both fail because they force a tradeoff between control and flexibility that high-stakes use cases don’t allow.
Orchestration wraps LLMs in workflow systems—state machines, routing logic, branching conditions. Control and flexibility live in separate systems: the state machine provides control (coded transitions, defined states), the LLM provides flexibility (handles language, deals with variation).
The problem is that these systems don’t share understanding.
Consider: a user is mid-payment and says “wait, what’s the cancellation policy before I pay?”
Rigid orchestration: No transition coded for this. The system either breaks, gives a canned response, or forces the user back on script. Control preserved, flexibility destroyed.
More branches: You add a branch for this case. But then users ask about refunds mid-payment. Or shipping. Or they change their mind about the item. You code more branches. Real deployments accumulate hundreds of them and still miss edge cases. You’ve built a maintenance nightmare that still can’t handle the next unexpected input.
LLM fallback: When the state machine can’t handle something, hand off to the LLM. But now the LLM has no understanding of where you are in the flow, what constraints apply, what state has accumulated. It might process the payment without confirmation because it’s just predicting tokens, not computing from state. You got flexibility by surrendering control.
This is orchestration’s fundamental limitation: control and flexibility are inversely correlated. The tighter your state machine, the worse the user experience. The more you rely on the LLM for flexibility, the less certainty you have about behavior. High-stakes use cases require both simultaneously, and orchestration forces you to choose.
There’s also a structural problem: there’s no model. What makes decisions is the state machine itself, a structure you define entirely by hand. Every branch, every condition, every behavior must be explicitly coded. Each workflow is its own silo. The NLU might generalize across workflows, but the logic doesn’t: rules are fragmented by design, coded separately even when workflows share entities. When business logic changes, you update it in multiple places. When new workflows need similar constraints, you rebuild them from scratch.
Orchestration gives you control over a flowchart. It doesn’t give you a controllable agent.
Function-calling (tool-use) agents take a different approach: give the LLM access to tools and let it decide when to call them. This provides flexibility: the agent handles unexpected inputs without breaking.
But you lose control, because the LLM is still the decision-maker.
When an LLM decides to call a function, that decision is sampled from a probability distribution. The model predicts that a tool call is likely the right next token given the context. It doesn’t compute from state. It doesn’t evaluate constraints. It approximates.
You can make unwanted tool calls less likely through prompting, fine-tuning, or output filtering. You cannot make them impossible. The LLM might call the refund function without verifying documentation. It might skip the confirmation step. It might invoke a tool with incorrect parameters. These aren’t bugs to fix, they’re inherent to how the architecture works.
Some systems add validation layers: check the tool call before executing, reject if constraints aren’t met. This helps, but it’s reactive: the agent already decided to take the action; you’re just blocking it after the fact. And the validation logic is coded per-tool, not derived from shared understanding of the domain. You’re back to the fragmentation problem.
Both approaches fail for the same structural reason: control and flexibility live in separate systems that don’t share understanding. The state machine doesn’t know language. The LLM doesn’t know state. When you need both properties simultaneously—which is always, in real conversations with real stakes—one system has to hand off to the other, and something gets lost in the transfer.
What’s needed is an architecture where control and flexibility aren’t separate systems trading off, but unified properties of one model. Where the same computation that handles unexpected questions is the same computation that enforces constraints. The fluency of neural language generation and the reliability of symbolic constraint enforcement. Unified, not bolted together.
Apollo-1 is the first foundation model built for controllable agents.
Controllable agents are agents whose behavior can be made certain in critical scenarios while preserving natural conversation everywhere else.
Apollo-1 is not a language model adapted for control. It’s not an orchestration layer around existing models. It’s a new foundation, built from the ground up on neuro-symbolic architecture that unifies generation and control within a single model. A new computational category.
This unification is the key architectural insight. Apollo-1 doesn’t pass information between separate neural and symbolic systems. The neural and symbolic components operate together on the same representation in the same computational loop: interpreting language, maintaining state, enforcing constraints, and generating responses as one integrated process.
The neural components handle language: interpreting meaning, managing ambiguity, producing fluent responses. The symbolic components enforce constraints: tracking state, applying rules, ensuring prohibited behaviors never execute without required conditions.
The agent understands language like an LLM. It enforces constraints like a formal system. One model. Both capabilities. Native to the architecture.
When a user asks about cancellation policy mid-payment, the neural component understands the question naturally while the symbolic component knows you’re mid-payment. State is explicit, not inferred from context. The model answers fluently while maintaining the payment flow. The constraint “don’t process payment without confirmation” holds absolutely.
You didn’t code a branch for this. You didn’t hand off to an LLM that might lose track of state. The same computation that provides flexibility is the same computation that enforces control.
This is fundamentally different from orchestration. There is a model: a model that understands task-oriented dialogue as a computational domain. The symbolic structures represent entities, relationships, and constraints that are shared across workflows. When you define a rule about refund authorization, the model understands how it relates to customer status, order history, and documentation requirements, not because you coded those connections, but because the ontology is part of the model’s representation.
When you’ve defined specific constraints, they hold absolutely. When you haven’t, the agent thinks for itself.
Apollo-1 is domain-agnostic and use-case-agnostic. The same model powers auto repair scheduling, insurance claims, retail support, healthcare navigation, and financial services, without rebuilding logic per workflow or manual ontology creation. The symbolic structures—intents, constraints, parameters, execution semantics—are universal primitives. This is what makes Apollo-1 a foundation model: not scale, but representational generality. Same model, different System Prompts.
Apollo-1 provides:
In 2017, we began solving and encoding millions of real-user task-oriented conversations into structured data, powered by a workforce of 60,000 human agents. The core insight wasn’t about data scale; it was about what must be represented.
We found out that task-oriented conversational AI requires two kinds of knowledge working together:
Training a transformer on multi-turn transcripts can capture conversational style, but it won’t teach the model how to handle critical interactions correctly. Datasets are one-dimensional and stateless. Without explicit state, how is the model supposed to learn when to block an action versus when to allow it?
To compute reliably over both kinds of knowledge, we needed a representation that separates structure from context while carrying each. We constructed a symbolic language that encodes procedural roles and descriptive facts, giving the model a typed symbolic state it can reason over and enforce constraints against.
In parallel, we observed that across use cases and domains—selling shoes, booking flights, processing loans—task-oriented dialogue follows universal procedural patterns. Food delivery, claims processing, and order management share similar procedural structures: parameter extraction, constraint validation, intent identification, policy enforcement, state-dependent branching, etc.
The key insight: if we could create a unified model where neural modules handle context and symbolic modules handle structure, operating together rather than in sequence, we’d solve the problem at its root. Of course, it’d have to work agnostically across domains and use cases, capable of symbolically representing any scenario requiring controllable behavior.
For the actual computation, we developed the Neuro-Symbolic Reasoner, a cognitive core that computes next actions from the current symbolic state, as opposed to predicting the next token. Neural modules assist in the translation to and from the symbolic language, while symbolic modules maintain explicit state, enforce constraints, and ensure that tool invocations are structured rather than probabilistically sampled.
Together, the symbolic language and the reasoner form Apollo-1: the foundation model for controllable agents.
Apollo-1’s breakthrough is stateful neuro-symbolic reasoning: a computation built explicitly for task-oriented conversational AI.
The Neuro-Symbolic Reasoner operates on symbolic structures—intents, constraints, parameters, actions—that remain constant across domains, while neural modules continuously enrich those structures with semantic nuance.
Apollo-1’s neuro-symbolic design unifies neural modules that understand context with symbolic modules that enforce constraints.
The symbolic state represents both procedural progress (what state we’re in) and descriptive facts (what we know). Neural components interpret language and enrich understanding; symbolic components ensure reliable execution or reliable non-execution when constraints aren’t met.
Perception is probabilistic, but given the same state, the Reasoner always makes the same decision, delivering the behavioral guarantees that controllable agents require and making task execution configurable, reproducible and auditable.
The Symbolic Reasoning Engine is a deterministic, rule-based engine, based on the procedural logic learned from years of solving and encoding millions of multi-turn task-oriented conversations with human agents, relying on a reputation system that ranks their turn outputs based on peer feedback.
Augmented Intelligence (AUI) Inc. – Patents Pending
Apollo-1 ships with a Playground where any use case runs from the System Prompt alone. The System Prompt isn’t configuration. It’s a behavioral contract.
When you define your tools in the System Prompt, Apollo-1 automatically generates an ontology: a structured representation of your entities, parameters, and relationships. This ontology is shared—the same entities and relationships are understood across all your workflows, which is why constraints defined once apply everywhere they’re relevant.
From this ontology, you define constraints that specify exactly when actions are blocked:
Policy Constraints: Business rules the agent must enforce. “Block disputes for transactions older than 8 days.”
Confirmation Constraints: Actions that require explicit user consent before execution. “Require confirmation before processing payment.”
Authentication Constraints: Actions that require identity verification before execution. “Require ID verification for refunds over $200.”
These aren’t instructions the agent tries to follow. They’re constraints enforced at the architectural level. Once state is correctly interpreted, constraint enforcement is deterministic.
When a telecom provider specifies “never process a plan downgrade during an active billing dispute,” that constraint holds. The action is blocked at the symbolic level before it can execute. The agent explains why, offers alternatives, maintains a natural conversation. But the constraint holds regardless of how the user phrases the request.
Constraint enforcement is deterministic; perception is not. This means failures are confined to a narrower, more auditable category: the system may misunderstand what’s being requested, but it won’t ‘decide’ to skip a required step or ‘forget’ a policy mid-conversation. Misclassification affects whether an action is attempted, not whether constraints are enforced. If perception fails, the action isn’t invoked; the user experiences task failure, not policy violation.
You define the constraints that matter: what must happen, what must never happen without the right conditions. Apollo-1 makes them absolute. For everything else, the agent remains conversationally intelligent: it handles curveballs, maintains context, responds naturally.
Apollo-1’s architecture makes deliberate trade-offs. By optimizing for task-oriented agents, we’ve built a model that doesn’t compete in other domains, and that’s by design.
Open-Ended Creative Work
Apollo-1 isn’t designed for creative writing, brainstorming sessions, or exploratory dialogue where variation creates value. For drafting marketing copy, generating story ideas, or exploring hypothetical scenarios, transformers remain the superior architecture. Our symbolic structures enforce consistency; creativity often requires the opposite.
Code Generation & Software Development
While Apollo-1 can integrate with code execution tools in task-oriented workflows, it doesn’t offer state-of-the-art code generation. Transformers trained on massive code repositories excel at synthesizing programming patterns, autocompleting functions, and explaining algorithms. Apollo-1’s symbolic language is purpose-built for task execution, not software development.
Low-Stakes, High-Variation Scenarios
When conversational variety enhances user experience—customer engagement campaigns, educational tutoring with adaptive responses, entertainment chatbots—probabilistic variation is often preferable to deterministic certainty. Apollo-1’s constraints become a limitation when flexibility is the goal.
Apollo-1 is deployed in production at Fortune 500 organizations. Partnerships to power consumer-facing AI at some of the world’s largest companies in retail, automotive, and regulated industries will be announced alongside general availability.
Organizations testing Apollo-1 against their existing systems—some built over years with teams of thousands—are seeing consistent patterns: substantial improvements in task completion rates across domains.
| Test / Benchmark | Apollo-1 | Best LLM Agent | Δ |
| τ-Bench-Airline | 90.8–92.5% | Claude-4 60% | +51% |
| Google Flights – 111 live booking chats | 83% | Gemini 2.5-Flash 22% | +277% |
| Amazon Retail – 120 live shopping chats | 90.8% | Rufus 16.7% | +444% |
A technical blog post—including architectural specifications, formal proofs, procedural ontology samples, evaluation methodologies, and turn-closure semantics—will be released alongside general availability.
The high-stakes conversations that drive economic activity become reliably automatable.
Booking systems that never double-book. Claims processing that never approves without required documentation. Customer service that never processes unauthorized transactions. Transaction systems that never exceed approved parameters.
Enterprises can finally trust agents with customer interactions because they have certainty that:
Apollo-1 benefits from advances in neural language understanding. As LLMs improve, Apollo-1’s neural modules improve with them. The symbolic language evolves independently, expanding the granularity of domain-agnostic procedural states it can represent.
General Availability: January 2026
Apollo-1’s architecture integrates with existing Generative AI workflows and adapts to any API or external system, with no need to change endpoints or preprocess data. It launches with native connectivity with all major platforms (Salesforce, HubSpot, Zendesk, etc.), and full MCP support. Strategic go-to-market partnership with Google. Launching with:
High-stakes use cases require both behavioral certainty and conversational flexibility. LLMs can’t provide it. Orchestration can’t solve it. Control and flexibility remain separate systems, trading off against each other.
Apollo-1 unifies them. Neural for flexibility. Symbolic for constraints. One model.