Authors: Ohad Elhelo and Ori Cohen, Co-Founders
Ever since ChatGPT burst onto the scene, we’ve been on a generative AI rollercoaster. The highs have been thrilling—machines communicating naturally, fluently, convincingly—but the lows are undeniable. Generative AI remains unpredictable, opaque, and fundamentally unreliable in scenarios where conversational agents must reliably act on behalf of entities. Yet these scenarios represent a substantial and economically critical portion of all potential AI applications.
For AI to handle these critical interactions, we must transcend purely generative models and embrace a new architecture: Neuro-Symbolic AI. Apollo-1, our neuro-symbolic foundation model for conversational agents, marks the beginning of this transformation. In recent evaluation tests and benchmarks detailed below, Apollo-1 consistently outperformed state-of-the-art generative models by wide margins, precisely on tasks that require conversational fluency combined with dependable, transparent action.
For decades, AI researchers debated two distinct paths: symbolic AI, emphasizing rules, logic, and explicit reasoning; and neural networks, skilled at pattern recognition and statistical learning. Both have strengths, both have critical limitations. Purely symbolic AI struggles with natural, conversational interaction. Purely neural, generative AI falters when trust, reliability, and consistency are non-negotiable. The goal was always to combine these methods, leveraging each to address the other’s weaknesses. With Apollo-1, that vision is now reality: conversational agents that converse fluently and act reliably.
The fundamental shortcomings of generative AI have become increasingly visible (and costly):
Generative AI alone simply cannot fulfill the promises the world expects from advanced artificial intelligence. Recent academic work underscores how far today’s LLM agents are from that bar. A May 2025 paper, “CRMArena-Pro: Holistic Assessment of LLM Agents Across Diverse Business Scenarios and Interactions”, reports that state-of-the-art agents score only 58% on single-turn tasks and collapse to ≈35% once the dialogue spans multiple turns, leading the authors to highlight a “significant gap between current LLM capabilities and real-world enterprise demands.”1
Generative AI sparked the first wave of conversational AI, tailored primarily to individual users. But when AI acts on behalf of organizations—like airlines, banks, or retailers—unpredictability becomes an existential liability. The solution demands a shift from purely neural to Neuro-Symbolic reasoning.
Neuro-Symbolic AI bridges the gap between Generative AI’s linguistic capabilities and Symbolic AI’s structured reasoning, unlocking actionable, reliable, transparent, and controllable AI interactions. This second wave of AI unlocks reliable conversational agents capable of working on behalf of entities, not end-users, performing accurately, consistently, and compliantly.
Apollo-1 is our neuro-symbolic foundation model for conversational agents. It is designed to power conversational agents acting on behalf of entities across industries and use-cases. Apollo-1 enables advanced native tool use through reliable, structured symbolic interactions with complex APIs. It provides comprehensive traceability, with each decision logged, fully inspectable, and editable in real-time. Finally, it offers steerability and controllability, allowing organizations to consistently steer agents toward desired behaviors by providing granular context, instructions, and guidelines.
In Apollo-1, a Symbolic Reasoner replaces the transformer as the model’s decision-making core. User input is parsed by lightweight NLP modules and translated into a symbolic language of controllable entities (e.g. intent, constraints, context). The Symbolic Reasoner uses these entities to plan and trigger external actions. Action responses pass back through the same NLP modules, are re-encoded as controllable entities, and inform the Symbolic Reasoner, which then either produces the final model output or keeps invoking additional actions and tools until the interaction is complete.
Apollo-1’s Symbolic Reasoner is informed by the System Prompt, which operators use for structured context engineering—providing granular guidelines, guardrails, policies, and instructions that help Apollo-1 generalize effectively across diverse interactions. Throughout each interaction, Apollo-1 transparently reveals its symbolic reasoning steps in the Reasoning Panel, the visible “brain” of the model, enabling operators to inspect decisions, provide real-time human feedback, tweak reasoning steps and replay scenarios as needed.
This neuro-symbolic architecture ensures not only conversational fluency but also unprecedented levels of transparency, reliability, and controllability.
* Final Pass^¹, Pass^², Pass^³, and Pass^⁴ numbers will be published in our official τ-Bench report in July 2025. In each test (Google Flights, Amazon), Apollo-1 demonstrates superior reliability using identical real-time APIs and data sources.
This is not a mere incremental improvement. Neuro-Symbolic AI fundamentally transforms conversational agents’ capabilities, marking a decisive leap forward. We now have AI that doesn’t just converse but reliably and transparently reasons, adheres to explicit policies, and executes complex tasks seamlessly. With Neuro-Symbolic AI, we finally overcome the chronic limitations of purely generative systems. We get fluent language, yes, but far more importantly, we achieve:
These properties underscore Neuro-Symbolic AI as a hybrid approach that unlocks the second wave after purely Generative AI: transparency without sacrificing fluency, controllability managed clearly, and consistent, reliable interactions. This comparison specifically reflects capabilities required for conversational agents acting reliably on behalf of entities. Purely generative LLMs remain effective for creative, exploratory, open-ended conversational tasks, and similar use-cases.
Apollo-1, the first neuro-symbolic model of its kind, is already demonstrating transformative results at scale in pilot programs across leading Fortune 500 organizations in critical sectors. Its modular architecture is designed for easy integration with existing generative-AI-based workflows, allowing companies to smoothly transition without a significant operational disruption. Strategic go-to-market partnership with Google; General Availability in September 2025, complete with open APIs, toolkits, and rigorous evaluation methods.
As researchers, developers, and organizations everywhere grapple with AI’s complexities and responsibilities, Neuro-Symbolic AI provides a coherent, principled solution: transparency, reliability, controllability, and adaptability, all built into the core architecture.
Generative AI proved machines can talk and think.
Neuro-Symbolic AI proves they can talk, think, follow rules, and act reliably.
Click here to explore Apollo-1’s Eval Playground, where you can experience its capabilities firsthand. Interact with the Control and Reasoning Panels, navigate real-time conversations across multiple evaluation domains, and review live benchmark trajectories (passkey required; request access).
1 Huang, K-H.; Prabhakar, A.; Thorat, O.; Agarwal, D.; Choubey, P.K.; Mao, Y.; Savarese, S.; Xiong, C.; Wu, C-S. CRMArena-Pro: Holistic Assessment of LLM Agents Across Diverse Business Scenarios and Interactions. Salesforce AI Research (2025). arXiv:2505.18878.