Skip to main content

ValkyrAI Command Processor System

The ValkyrAI Command Processor enables LLMs to execute both frontend (browser-based) and backend (server-based) commands through structured XML syntax. This creates powerful AI workflows where models can interact with APIs, read URLs, capture screen data, and perform server operations.

Architecture Overview

Frontend Command Processor

  • Location: web/typescript/valkyr_labs_com/src/utils/commandProcessor.ts
  • Purpose: Handles browser-based commands like screen capture, navigation, UI interactions
  • Execution: Runs in the user's browser with SageChat integration

Backend Command Processor

  • Location: valkyrai/src/main/java/com/valkyrlabs/valkyrai/service/CommandProcessor.java
  • Purpose: Handles server-side commands like API calls, URL reading, system operations
  • Execution: Runs on the ValkyrAI server with security controls

Command Formats

All commands use XML syntax for consistent parsing across frontend and backend systems.

Frontend Commands

Screen Capture

Captures current page content for AI analysis.

<screencapture>
<type>screenshot</type>
</screencapture>

Parameters:

  • type: screenshot (full page capture) or screenscrape (text-only)

Example Usage: "Please capture what's currently on the screen so I can help you with this page."

WebSocket Communication

Sends messages through the WebSocket connection.

<websocket>
<message>{"type": "command", "payload": "Hello WebSocket"}</message>
</websocket>

Parameters:

  • message: JSON object containing WebSocket message data

Navigate to different URLs within the application.

<navigate>
<url>/dashboard</url>
</navigate>

Parameters:

  • url: Target URL (relative or absolute)

UI Interactions

Perform browser UI actions like clicking, scrolling, typing.

<ui_action>
<action>click</action>
<target>.btn-primary</target>
<value>optional text for type actions</value>
</ui_action>

Parameters:

  • action: click, scroll, type
  • target: CSS selector for the target element
  • value: Text to type (only for type action)

Backend Commands

API Data Retrieval

Fetch data from internal APIs with optional filtering.

<get_api_data>
<api>ContentData</api>
<qbe>{"type": "BLOG"}</qbe>
<params>limit=5&sort=createdDate,desc</params>
</get_api_data>

Parameters:

  • api: API endpoint name (e.g., ContentData, Product, Customer)
  • qbe: Query By Example JSON for filtering (optional)
  • params: Additional query parameters like limit, sort (optional)

Example Usage: "Show me the latest 5 blog posts from the content system."

URL Reading

Read content from external URLs with security restrictions.

<read_url>
<url>https://api.example.com/data</url>
<format>json</format>
</read_url>

Parameters:

  • url: Target URL to read
  • format: text or json (optional, defaults to text)

API Data Posting

Create new records through internal APIs.

<post_api_data>
<api>ContentData</api>
<data>{"title": "New Blog Post", "content": "Hello World", "type": "BLOG"}</data>
</post_api_data>

Parameters:

  • api: API endpoint name
  • data: JSON data to create

Server Operations

Execute safe server operations (read-only by default).

<server_op>
<operation>status</operation>
<target>valkyrai</target>
</server_op>

Parameters:

  • operation: status, logs (extensible for more operations)
  • target: Service name or "system" (optional)

Usage Examples

Content Analysis Workflow

User: "What blog content do we have about AI?"

LLM Response: "Let me check our content database for AI-related blog posts.

<get_api_data>
<api>ContentData</api>
<qbe>{"type": "BLOG", "content": "*AI*"}</qbe>
<params>limit=10</params>
</get_api_data>

I found several AI-related blog posts in your content system..."

Screen Assistance Workflow

User: "I'm confused by this page, can you help?"

LLM Response: "I'll take a look at what you're seeing to better assist you.

<screencapture>
<type>screenscrape</type>
</screencapture>

Based on the current page content, I can see you're on the dashboard..."

Data Integration Workflow

User: "Create a summary of our products and check external pricing data."

LLM Response: "I'll gather your product data and check external pricing.

<get_api_data>
<api>Product</api>
<params>limit=20</params>
</get_api_data>

<read_url>
<url>https://api.competitor.com/prices</url>
<format>json</format>
</read_url>

Based on your 20 products and the external pricing data..."

Security & Permissions

Frontend Security

  • Commands execute within browser sandbox
  • Screen capture limited to current domain
  • Navigation restricted to safe URLs
  • UI interactions limited to visible elements

Backend Security

  • URL reading restricted to whitelisted domains
  • API calls use user's authentication context
  • Server operations limited to read-only by default
  • All command execution is logged and audited

Permission Levels

  1. Anonymous Users: Limited to screen capture and safe navigation
  2. Authenticated Users: API data access based on user permissions
  3. Admin Users: Extended server operations (when implemented)

Integration with SageChat

SageChat automatically processes commands in LLM responses:

  1. Command Detection: XML commands parsed from LLM response
  2. Context Injection: Command results added to conversation context
  3. User Feedback: Success/failure messages shown in chat
  4. WebSocket Integration: Real-time command execution updates

Development Guidelines

Adding New Commands

Frontend Commands

  1. Add command handler in FrontendCommandProcessor
  2. Implement execution method with proper error handling
  3. Update command documentation
  4. Add tests for new functionality

Backend Commands

  1. Add command pattern in CommandProcessor
  2. Implement execution method with security checks
  3. Add to security whitelist if needed
  4. Document command format and usage

Best Practices

  • Always validate command parameters
  • Implement proper error handling and user feedback
  • Use security whitelist approach (deny by default)
  • Log all command execution for audit purposes
  • Keep commands atomic and idempotent where possible

Troubleshooting

Common Issues

  • Command not executing: Check XML syntax and parameter validation
  • Security errors: Verify URL/operation is whitelisted
  • API errors: Check user permissions and API endpoint availability
  • WebSocket issues: Ensure connection is established before sending commands

Debugging

  • Enable debug logging in both frontend and backend processors
  • Check browser console for frontend command issues
  • Review server logs for backend command execution
  • Use WebSocket debug messages for real-time feedback

Future Enhancements

  • File Operations: Upload/download file commands
  • Database Queries: Direct SQL execution with proper security
  • External Integrations: Commands for third-party services
  • Workflow Orchestration: Multi-step command sequences
  • Advanced Security: Role-based command permissions