We raised $20M at a $750M valuation

Blog

Apollo-1: The Foundation Model for Controllable Agents

Apollo-1 is the first neuro-symbolic foundation model, unifying generation and control to enable high-stakes use cases. It combines neural fluency with symbolic constraints, delivering deterministic guarantees inside natural conversations.
Pending Release

Authors: Ohad Elhelo, Ori Cohen, Co-Founders

 

01. The Control Problem

Two different things are emerging under the name “agent.”

Open-ended agents work for users. Coding assistants, computer-use agents, personal AI. You’re the principal. If the agent interprets your intent slightly differently each time, that’s fine. You’re in the loop. You’ll correct it. Flexibility is the point.

Task-oriented agents work on behalf of entities. An airline’s booking agent. A bank’s support agent. An insurer’s claims agent. These agents serve users, but they represent the entity. The entity is the principal, the one whose policies must be enforced.

Task-oriented agents need constraints: the certainty that specific behaviors will never occur without the right conditions. The ticket won’t be cancelled unless the passenger is Business Class and Platinum Elite. The payment won’t process without explicit confirmation. The refund won’t be issued if required documentation is missing.

These aren’t preferences. They’re requirements that determine whether AI can be trusted with interactions involving real money, real appointments, and real consequences.

But task-oriented agents also need flexibility. Users don’t follow scripts. They ask unexpected questions, change their mind, go off on tangents. The agent has to handle real conversation while enforcing real constraints.

That combination is the hard problem.


02. Why Current Approaches Can’t Solve It

The opportunity is enormous. Every conversation that results in real-world action—booking flights, processing payments, managing claims, executing trades—could be automated. These interactions run the economy. The market for task-oriented agents dwarfs what open-ended assistants will ever capture.

But without control, task-oriented agents can’t be deployed reliably at scale. Enterprises won’t trust AI with customer interactions when “usually works” is the best assurance available. This is why, despite three years and billions of dollars, high-stakes enterprise deployments are hard to come by. 

High-stakes use cases have a non-negotiable requirement: they need both control and flexibility. Control because constraints must hold absolutely; flexibility because real users don’t follow scripts.

The industry has converged on two main approaches: orchestration frameworks and function-calling LLM agents. Both fail because they force a tradeoff between control and flexibility that high-stakes use cases don’t allow.

Orchestration Frameworks

Orchestration wraps LLMs in workflow systems—state machines, routing logic, branching conditions. Control and flexibility live in separate systems: the state machine provides control (coded transitions, defined states), the LLM provides flexibility (handles language, deals with variation).

The problem is that these systems don’t share understanding.

Consider: a user is mid-payment and says “wait, what’s the cancellation policy before I pay?”

Rigid orchestration: No transition coded for this. The system either breaks, gives a canned response, or forces the user back on script. Control preserved, flexibility destroyed.

More branches: You add a branch for this case. But then users ask about refunds mid-payment. Or shipping. Or they change their mind about the item. You code more branches. Real deployments accumulate hundreds of them and still miss edge cases. You’ve built a maintenance nightmare that still can’t handle the next unexpected input.

LLM fallback: When the state machine can’t handle something, hand off to the LLM. But now the LLM has no understanding of where you are in the flow, what constraints apply, what state has accumulated. It might process the payment without confirmation because it’s just predicting tokens, not computing from state. You got flexibility by surrendering control.

This is orchestration’s fundamental limitation: control and flexibility are inversely correlated. The tighter your state machine, the worse the user experience. The more you rely on the LLM for flexibility, the less certainty you have about behavior. High-stakes use cases require both simultaneously, and orchestration forces you to choose.

There’s also a structural problem: there’s no model. What makes decisions is the state machine itself, a structure you define entirely by hand. Every branch, every condition, every behavior must be explicitly coded. Each workflow is its own silo. The NLU might generalize across workflows, but the logic doesn’t: rules are fragmented by design, coded separately even when workflows share entities. When business logic changes, you update it in multiple places. When new workflows need similar constraints, you rebuild them from scratch.

Orchestration gives you control over a flowchart. It doesn’t give you a controllable agent.

Function-Calling LLM Agents

Function-calling (tool-use) agents take a different approach: give the LLM access to tools and let it decide when to call them. This provides flexibility: the agent handles unexpected inputs without breaking.

But you lose control, because the LLM is still the decision-maker.

When an LLM decides to call a function, that decision is sampled from a probability distribution. The model predicts that a tool call is likely the right next token given the context. It doesn’t compute from state. It doesn’t evaluate constraints. It approximates.

You can make unwanted tool calls less likely through prompting, fine-tuning, or output filtering. You cannot make them impossible. The LLM might call the refund function without verifying documentation. It might skip the confirmation step. It might invoke a tool with incorrect parameters. These aren’t bugs to fix, they’re inherent to how the architecture works.

Some systems add validation layers: check the tool call before executing, reject if constraints aren’t met. This helps, but it’s reactive: the agent already decided to take the action; you’re just blocking it after the fact. And the validation logic is coded per-tool, not derived from shared understanding of the domain. You’re back to the fragmentation problem.

Function-calling agents offer flexibility without control. Orchestration offers control without flexibility. High-stakes use cases require both, which is why neither approach has produced reliable deployments in scenarios with real consequences.

The Architectural Requirement

Both approaches fail for the same structural reason: control and flexibility live in separate systems that don’t share understanding. The state machine doesn’t know language. The LLM doesn’t know state. When you need both properties simultaneously—which is always, in real conversations with real stakes—one system has to hand off to the other, and something gets lost in the transfer.

What’s needed is an architecture where control and flexibility aren’t separate systems trading off, but unified properties of one model. Where the same computation that handles unexpected questions is the same computation that enforces constraints. The fluency of neural language generation and the reliability of symbolic constraint enforcement. Unified, not bolted together.


03. Apollo-1

Apollo-1 is the first foundation model built for controllable agents.

Controllable agents are agents whose behavior can be made certain in critical scenarios while preserving natural conversation everywhere else.

Apollo-1 is not a language model adapted for control. It’s not an orchestration layer around existing models. It’s a new foundation, built from the ground up on neuro-symbolic architecture that unifies generation and control within a single model. A new computational category. 

Neuro-Symbolic Computation

This unification is the key architectural insight. Apollo-1 doesn’t pass information between separate neural and symbolic systems. The neural and symbolic components operate together on the same representation in the same computational loop: interpreting language, maintaining state, enforcing constraints, and generating responses as one integrated process.

The neural components handle language: interpreting meaning, managing ambiguity, producing fluent responses. The symbolic components enforce constraints: tracking state, applying rules, ensuring prohibited behaviors never execute without required conditions.

The agent understands language like an LLM. It enforces constraints like a formal system. One model. Both capabilities. Native to the architecture.

Solving the Control-Flexibility Tradeoff

When a user asks about cancellation policy mid-payment, the neural component understands the question naturally while the symbolic component knows you’re mid-payment. State is explicit, not inferred from context. The model answers fluently while maintaining the payment flow. The constraint “don’t process payment without confirmation” holds absolutely.

You didn’t code a branch for this. You didn’t hand off to an LLM that might lose track of state. The same computation that provides flexibility is the same computation that enforces control.

A Model, Not a Flowchart

This is fundamentally different from orchestration. There is a model: a model that understands task-oriented dialogue as a computational domain. The symbolic structures represent entities, relationships, and constraints that are shared across workflows. When you define a rule about refund authorization, the model understands how it relates to customer status, order history, and documentation requirements, not because you coded those connections, but because the ontology is part of the model’s representation.

When you’ve defined specific constraints, they hold absolutely. When you haven’t, the agent thinks for itself.

Because the reasoning is neuro-symbolic, it’s white-box. Every decision is traceable and auditable.

Universal by Design

Apollo-1 is domain-agnostic and use-case-agnostic. The same model powers auto repair scheduling, insurance claims, retail support, healthcare navigation, and financial services, without rebuilding logic per workflow or manual ontology creation. The symbolic structures—intents, constraints, parameters, execution semantics—are universal primitives. This is what makes Apollo-1 a foundation model: not scale, but representational generality. Same model, different System Prompts.

Apollo-1 provides:

  • Explicit state. Multi-turn interactions require tracking where you are in a process, what you know, what’s happened, what constraints apply.
  • Behavioral certainty. Enterprises need to know their agent will never take specific actions without specific conditions being met. Apollo-1’s symbolic constraints guarantee that. 
  • Native tool use. When interacting with external systems—booking engines, payment processors, CRMs—Apollo-1 derives tool invocations from explicit state rather than sampling them. Invocations run reliably with proper parameters, error handling, execution guarantees. 
  • White-box reasoning. Every decision is traceable: which rules fired, how state evolved, why the agent acted or refused to act.
  • Fluent interaction. The agent converses naturally, handles unexpected inputs, responds like humans expect. 
  • Self-serve deployment. Companies can configure and deploy agents directly from the Playground. Define tools, set constraints, test conversations, go live. No custom development required for most use cases.

04. Eight Years to Build the Solution

In 2017, we began solving and encoding millions of real-user task-oriented conversations into structured data, powered by a workforce of 60,000 human agents. The core insight wasn’t about data scale; it was about what must be represented.

We found out that task-oriented conversational AI requires two kinds of knowledge working together:

  • Descriptive knowledge — entities, attributes, domain content
  • Procedural knowledge — roles, constraints, flows, policies

Training a transformer on multi-turn transcripts can capture conversational style, but it won’t teach the model how to handle critical interactions correctly. Datasets are one-dimensional and stateless. Without explicit state, how is the model supposed to learn when to block an action versus when to allow it?

To compute reliably over both kinds of knowledge, we needed a representation that separates structure from context while carrying each. We constructed a symbolic language that encodes procedural roles and descriptive facts, giving the model a typed symbolic state it can reason over and enforce constraints against.

In parallel, we observed that across use cases and domains—selling shoes, booking flights, processing loans—task-oriented dialogue follows universal procedural patterns. Food delivery, claims processing, and order management share similar procedural structures: parameter extraction, constraint validation, intent identification, policy enforcement, state-dependent branching, etc.

The key insight: if we could create a unified model where neural modules handle context and symbolic modules handle structure, operating together rather than in sequence, we’d solve the problem at its root. Of course, it’d have to work agnostically across domains and use cases, capable of symbolically representing any scenario requiring controllable behavior.

For the actual computation, we developed the Neuro-Symbolic Reasoner, a cognitive core that computes next actions from the current symbolic state, as opposed to predicting the next token. Neural modules assist in the translation to and from the symbolic language, while symbolic modules maintain explicit state, enforce constraints, and ensure that tool invocations are structured rather than probabilistically sampled.

Together, the symbolic language and the reasoner form Apollo-1: the foundation model for controllable agents.


05. How It Works (at a glance)

Apollo-1’s breakthrough is stateful neuro-symbolic reasoning: a computation built explicitly for task-oriented conversational AI.

Apollo-1 achieves generalization through a fundamental principle: structure-content separation.

The Neuro-Symbolic Reasoner operates on symbolic structures—intents, constraints, parameters, actions—that remain constant across domains, while neural modules continuously enrich those structures with semantic nuance.

5 2resize icon

Architecture: encoder–stateful reasoning loop–decoder

  1. Domain-Agnostic Encoder: Translates natural language into symbolic state using both procedural and descriptive knowledge.
  2. Stateful Reasoning Loop (iterates until turn completion):
    • Neuro-Symbolic State Machine maintains symbolic state
    • Symbolic Reasoning Engine computes next actions from state
    • Neuro-Symbolic Planner creates executable plans
  3. Domain-Agnostic Decoder: Generates natural language from final state

Apollo-1’s neuro-symbolic design unifies neural modules that understand context with symbolic modules that enforce constraints.

The symbolic state represents both procedural progress (what state we’re in) and descriptive facts (what we know). Neural components interpret language and enrich understanding; symbolic components ensure reliable execution or reliable non-execution when constraints aren’t met.

Perception is probabilistic, but given the same state, the Reasoner always makes the same decision, delivering the behavioral guarantees that controllable agents require and making task execution configurable, reproducible and auditable.

The Symbolic Reasoning Engine is a deterministic, rule-based engine, based on the procedural logic learned from years of solving and encoding millions of multi-turn task-oriented conversations with human agents, relying on a reputation system that ranks their turn outputs based on peer feedback.


Augmented Intelligence (AUI) Inc. – Patents Pending


06. Defining Constraints

Apollo-1 ships with a Playground where any use case runs from the System Prompt alone. The System Prompt isn’t configuration. It’s a behavioral contract.

When you define your tools in the System Prompt, Apollo-1 automatically generates an ontology: a structured representation of your entities, parameters, and relationships. This ontology is shared—the same entities and relationships are understood across all your workflows, which is why constraints defined once apply everywhere they’re relevant.

From this ontology, you define constraints that specify exactly when actions are blocked:

Policy Constraints: Business rules the agent must enforce. “Block disputes for transactions older than 8 days.”

Confirmation Constraints: Actions that require explicit user consent before execution. “Require confirmation before processing payment.”

Authentication Constraints: Actions that require identity verification before execution. “Require ID verification for refunds over $200.”

Express these constraints in natural language; Apollo-1 compiles them into enforceable logic.

These aren’t instructions the agent tries to follow. They’re constraints enforced at the architectural level. Once state is correctly interpreted, constraint enforcement is deterministic. 

When a telecom provider specifies “never process a plan downgrade during an active billing dispute,” that constraint holds. The action is blocked at the symbolic level before it can execute. The agent explains why, offers alternatives, maintains a natural conversation. But the constraint holds regardless of how the user phrases the request.

Constraint enforcement is deterministic; perception is not. This means failures are confined to a narrower, more auditable category: the system may misunderstand what’s being requested, but it won’t ‘decide’ to skip a required step or ‘forget’ a policy mid-conversation. Misclassification affects whether an action is attempted, not whether constraints are enforced. If perception fails, the action isn’t invoked; the user experiences task failure, not policy violation.

You define the constraints that matter: what must happen, what must never happen without the right conditions. Apollo-1 makes them absolute. For everything else, the agent remains conversationally intelligent: it handles curveballs, maintains context, responds naturally.


07. What Apollo-1 Isn’t For: Architecture as Choice

Apollo-1’s architecture makes deliberate trade-offs. By optimizing for task-oriented agents, we’ve built a model that doesn’t compete in other domains, and that’s by design.

Open-Ended Creative Work
Apollo-1 isn’t designed for creative writing, brainstorming sessions, or exploratory dialogue where variation creates value. For drafting marketing copy, generating story ideas, or exploring hypothetical scenarios, transformers remain the superior architecture. Our symbolic structures enforce consistency; creativity often requires the opposite.

Code Generation & Software Development
While Apollo-1 can integrate with code execution tools in task-oriented workflows, it doesn’t offer state-of-the-art code generation. Transformers trained on massive code repositories excel at synthesizing programming patterns, autocompleting functions, and explaining algorithms. Apollo-1’s symbolic language is purpose-built for task execution, not software development.

Low-Stakes, High-Variation Scenarios
When conversational variety enhances user experience—customer engagement campaigns, educational tutoring with adaptive responses, entertainment chatbots—probabilistic variation is often preferable to deterministic certainty. Apollo-1’s constraints become a limitation when flexibility is the goal.


08. Early Deployments & Results

Apollo-1 is deployed in production at Fortune 500 organizations. Partnerships to power consumer-facing AI at some of the world’s largest companies in retail, automotive, and regulated industries will be announced alongside general availability.

Organizations testing Apollo-1 against their existing systems—some built over years with teams of thousands—are seeing consistent patterns: substantial improvements in task completion rates across domains.

Test / Benchmark Apollo-1 Best LLM Agent Δ
τ-Bench-Airline 90.8–92.5% Claude-4 60% +51%
Google Flights – 111 live booking chats 83% Gemini 2.5-Flash 22% +277%
Amazon Retail – 120 live shopping chats 90.8% Rufus 16.7% +444%

A technical blog post—including architectural specifications, formal proofs, procedural ontology samples, evaluation methodologies, and turn-closure semantics—will be released alongside general availability.

[Request early access to the technical blog]


09. What’s Next

The high-stakes conversations that drive economic activity become reliably automatable.

Booking systems that never double-book. Claims processing that never approves without required documentation. Customer service that never processes unauthorized transactions. Transaction systems that never exceed approved parameters.

Enterprises can finally trust agents with customer interactions because they have certainty that:

  • Prohibited actions will never execute without required conditions
  • Required verifications will never be skipped
  • Sensitive operations will never proceed without proper authorization
  • Every decision will be traceable and auditable
Open-ended agents enhance productivity. Controllable agents are the productivity. Every transaction, every booking, every claim: these are the conversations that run the economy.

Apollo-1 benefits from advances in neural language understanding. As LLMs improve, Apollo-1’s neural modules improve with them. The symbolic language evolves independently, expanding the granularity of domain-agnostic procedural states it can represent.

General Availability: January 2026

Apollo-1’s architecture integrates with existing Generative AI workflows and adapts to any API or external system, with no need to change endpoints or preprocess data. It launches with native connectivity with all major platforms (Salesforce, HubSpot, Zendesk, etc.), and full MCP support. Strategic go-to-market partnership with Google. Launching with:

  • Open APIs
  • Full documentation and toolkits
  • Evaluation methodologies
  • Voice and image modalities

10. Conclusion

High-stakes use cases require both behavioral certainty and conversational flexibility. LLMs can’t provide it. Orchestration can’t solve it. Control and flexibility remain separate systems, trading off against each other.

Apollo-1 unifies them. Neural for flexibility. Symbolic for constraints. One model.

Corner Light
Corner Light
Back
Share
Corner Light
Corner Light