In this stage, we address the key challenges in our Marketplace Assistant:
- High latency and token usage due to monolithic design
- Poor separation of concerns
- Sequential processing of tasks
- Cost inefficiency for deterministic operations
We've transformed our application in two significant ways (with orra's Plan Engine driving coordination and reliability).
First, we divided the monolithic assistant into four specialized components:
- Product Advisor Agent: An LLM-powered agent that understands complex user needs and recommends products
- Inventory Tool as Service: Checks real-time product availability, and reserves/releases product stock
- Purchasing Tool as Service: Handles product purchase processing for users
- Delivery Agent: Uses real-time data to estimate delivery times
We made a critical architectural improvement by migrating the monolith's function/tool calls to dedicated services:
- Tool Calls in Monolith: The original design used LLM function calling for all operations, even simple deterministic ones like inventory operations and purchase processing
- Tools as Services: We extracted these tool functions into proper standalone services that can be directly coordinated
This creates a clear distinction between:
- Agents (LLM-powered): For tasks requiring complex reasoning and human-like responses
- (Tools as) Services (Deterministic): For predictable operations with consistent input/output patterns
We converted these tool functions into dedicated services:
- Inventory: Directly handles inventory operations (previously a function call)
- Purchasing: Handles purchase processing including creating orders, making payments and notifying users (previously a function call)
We kept the Product Advisor and Delivery as LLM-powered agents since they benefit from complex reasoning capabilities.
This architectural shift is enabled by orra's Plan Engine, which operates at the application level rather than just the agent level. This higher-level orchestration allows direct coordination between services, eliminating the need to tunnel all interactions through LLM function-calling. The Plan Engine understands and coordinates the entire workflow across both LLM-powered agents and deterministic services.
- Node.js (v18+)
- orra Plan Engine running and CLI installed
- OpenAI API key
- Initialize orra configuration
./stage_setup.sh # Sets up project, webhooks, and API keys
- Configure OpenAI API key in each component's
.env
fileOPENAI_API_KEY=your_openai_api_key_here
- Start each component (in separate terminals)
cd [component-directory] # Run for each component npm install npm start
- Start webhook simulator (in a separate terminal)
orra verify webhooks start http://localhost:3000/webhook
Let's run the expected AI Marketplace Assistant interactions described here.
We'll be using the CLI's orra verify
command to understand how the Plan Engine is coordinating our components to complete system actions.
The assumption here is that there's a chat UI interface that forwards requests to the Plan Engine.
We use lowdb to query and update data in our data.json file - basically a simple JSON based DB. This data is shared against all the components.
- Ask for a product recommendation
orra verify run 'Recommend a product' \
-d query:'I need a used laptop for college that is powerful enough for programming, under $800.'
Follow these instructions on how to inspect the orchestrated action.
In this case, you should see just the Product Advisor Agent
only executing and dealing with this action. Any interim errors are handled by orra.
- Enquire about delivery for the recommended product
orra verify run 'Can I get it delivered by next week?' \
-d 'productId:laptop-1' \
-d 'userId:user-1'
In this case, there should be
- an inventory check to ensure the product is in-stock
- if yes, a delivery estimate is provided
- Any interim errors are handled by orra
- Purchase a recommended product
orra verify run 'Purchase product' \
-d 'productId:laptop-1' \
-d 'userId:user-1'
In this case, there should be
- an inventory check to ensure the product is in-stock
- an inventory reserve request if the product is in-stock - this lowers the stock count
- A delivery estimate is provided
- The product is purchased - causing an order to be placed
- Any interim errors are handled by orra
Navigate to the data.json file to view the placed order
in the orders
list.
- Clear Plan Engine configurations and reset data
./stage_reset.sh # Clears configurations and data
-
Stop all the running components and kill all the terminal window
-
Shutdown the Plan Engine
-
Reduced Latency:
- orra automatically parallelises appropriate tasks
- Overall response time improved by ~60% - esp. after caching execution plans
- Services respond faster than LLM-based agents (40% improvement for deterministic operations)
-
Lower Token Usage:
- Specialised agents reduce token consumption by ~40%
- Converting tool to services reduces token usage by ~80% for inventory and purchasing operations
- Significant cost savings in production
-
Improved Maintainability:
- Each component has a single responsibility
- Easier to update, debug, and enhance individual components
- Clear separation between reasoning and deterministic parts
-
Better Reliability:
- Issues in one component don't necessarily impact others
- Deterministic services have fewer failure modes than LLM-based agents
- Automatic Orchestration: orra handles the coordination between components based on the user or application's intent
- Parallel Execution: Where possible, orra executes non-dependent tasks in parallel
- Service Discovery: Components register with orra, which then routes requests appropriately
- Seamless Integration: orra orchestrates between agents and services without code changes
- Execution Timeouts: Set execution timeout duration per service/agent - works around Agents just spinning their wheels
- High-Level Error Handling: Retrying execution on all errors - upto 5 retries
- Configurable Health Monitoring: orra pauses orchestrations due to unhealthy services and resumes them when health is restored
Our application is now more efficient, but it still lacks robust error handling. In the stage 2, we'll implement compensation mechanisms to handle failures and ensure state/transaction integrity.