Skip to main content

ValkyrAI Workflow Engine v2.0 — Quick Start Guide

Status: Phase 1 Complete ✅ — OpenAPI schemas & endpoints defined
Next: Regenerate code → Implement services → Ship to production


🎯 What Just Happened

We've implemented the foundation for ValkyrAI Workflow Engine v2.0 based on your PRD:

✅ New Data Models (ThorAPI schemas)

  • WorkflowExecution — Separate execution instances from workflow definitions
  • Run — Granular task attempt tracking with idempotency + leasing
  • DeadLetterQueue — Quarantine & replay failed runs
  • CircuitBreakerState — Protect external dependencies

✅ New API Endpoints

  • /WorkflowExecution/{id}/cancel|pause|resume — Execution control
  • /Run/{id}/heartbeat — Runner keepalive
  • /DeadLetterQueue/{id}/requeue|discard — DLQ operations

All CRUD endpoints (GET/POST/PUT/DELETE for list/create/update) will be auto-generated by ThorAPI.


🚀 Next Steps (Do This Now)

1. Regenerate Code

cd /Users/johnmcmahon/workspace/2025/valkyr/ValkyrAI

# Generate Java models, repositories, services, controllers
mvn -pl thorapi clean install

# Verify new models generated:
ls valkyrai/generated/spring/src/main/java/com/valkyrlabs/model/WorkflowExecution.java
ls valkyrai/generated/spring/src/main/java/com/valkyrlabs/model/Run.java
ls valkyrai/generated/spring/src/main/java/com/valkyrlabs/model/DeadLetterQueue.java
ls valkyrai/generated/spring/src/main/java/com/valkyrlabs/model/CircuitBreakerState.java

# Regenerate web TypeScript clients
cd web
npm run generate:api # or equivalent ThorAPI command

2. Database Migration

Create and run:

# Create migration file
cat > valkyrai/src/main/resources/db/migration/V2.0__workflow_execution_tracking.sql <<'EOF'
-- See WORKFLOW_ENGINE_V2_PHASE1_COMPLETE.md for full SQL
CREATE TABLE workflow_execution (...);
ALTER TABLE run ADD COLUMN exec_module_id UUID;
-- ... (see Phase 1 doc for complete script)
EOF

# Run migration (if using Flyway/Liquibase)
mvn flyway:migrate
# OR restart app with migration enabled

3. Implement Core Services

Priority Order:

  1. RunService (2-3 days)

    • Idempotency key generation
    • Lease management
    • Heartbeat tracking
    • File: valkyrai/src/main/java/com/valkyrlabs/workflow/service/RunService.java
  2. WorkflowExecutionService (1-2 days)

    • Wrap existing ValkyrWorkflowService
    • Track execution lifecycle
    • File: valkyrai/src/main/java/com/valkyrlabs/workflow/service/WorkflowExecutionService.java
  3. DLQService (1 day)

    • Requeue/discard logic
    • File: valkyrai/src/main/java/com/valkyrlabs/workflow/service/DLQService.java
  4. RunnerService (2-3 days)

    • Poll for pending runs
    • Execute with heartbeat
    • Zombie reaper
    • File: valkyrai/src/main/java/com/valkyrlabs/workflow/runner/RunnerService.java

4. Implement Controllers

All custom (non-CRUD) endpoints:

  • WorkflowExecutionController — cancel/pause/resume operations
  • DLQController — requeue/discard operations

Note: ThorAPI already generated base CRUD controllers; you just need to add the custom operation methods.

5. Frontend Components

  1. Generate RTK Query hooks (auto-generated from OpenAPI)
  2. WorkflowExecutionMonitor component (see Phase 1 doc)
  3. DLQBrowser component (see Phase 1 doc)
  4. Update WorkflowStudio to use executions instead of direct workflow runs

📚 Reference Documents


🎓 Key Concepts

Idempotency

Every run gets a content hash of inputs + config:

String idempotencyKey = SHA256(inputs) + ":" + SHA256(config);

Duplicate requests with same key → deduplicated (no duplicate side effects).

Lease Mechanism

Runners acquire a lease on a run:

  • Lease expires after 2 minutes (configurable)
  • Heartbeat every 2 seconds extends lease
  • Zombie reaper reclaims expired leases
  • Prevents double execution across crashes

DLQ (Dead Letter Queue)

Runs that fail permanently (max retries, permanent errors) → quarantined:

  • Operator can requeue with input overrides
  • Operator can discard with notes
  • Tracks resolution workflow

🧪 Testing Checklist

  • Idempotency test: Submit duplicate run → verify deduplicated
  • Crash recovery test: Kill runner mid-execution → verify resume without duplication
  • DLQ test: Force permanent failure → verify quarantine → requeue → success
  • Lease expiry test: Stop heartbeat → verify zombie reaper reclaims
  • Performance test: 10k concurrent executions → verify P95 ≤ 75ms dispatch

🏗️ Architecture Principles

  1. ThorAPI-First: All models defined in OpenAPI → code generated
  2. Separation of Concerns: Workflow = definition; WorkflowExecution = runtime instance
  3. Crash-Safe: Lease + heartbeat + zombie reaper = no lost tasks
  4. Idempotent: Content-based deduplication = no duplicate side effects
  5. Observable: Every run tracked; OpenTelemetry spans; DLQ for failures

🚦 Current Status

✅ Phase 1: Data Models & API Endpoints (COMPLETE)
🔄 Phase 2: Code Generation (IN PROGRESS — run mvn install)
⏳ Phase 3: Service Layer (NEXT — 3-5 days)
⏳ Phase 4: Runner Pool (2-3 days)
⏳ Phase 5: Controllers (1 day)
⏳ Phase 6: Observability (2 days)
⏳ Phase 7: Frontend (3-4 days)
⏳ Phase 8: Integration & Testing (3-5 days)

Estimated Time to Production: 2-3 weeks


💡 Pro Tips

  • Backward Compatibility: Existing ValkyrWorkflowService still works; we're wrapping it
  • Incremental Rollout: Start with select workflows; gradually migrate all
  • Monitoring: Set up OpenTelemetry exporter for Jaeger/Grafana Tempo
  • DLQ Dashboard: Build operator UI early for visibility

🤝 Team Coordination

Backend:

  • Implement services (RunService, WorkflowExecutionService, DLQService)
  • Database migrations
  • Integration tests

Frontend:

  • Regenerate TypeScript clients
  • Build WorkflowExecutionMonitor component
  • Build DLQ browser
  • Update WorkflowStudio

DevOps:

  • Configure runner pods (K8s deployment)
  • Set up OpenTelemetry collector
  • Database migration automation

📞 Questions?

See the detailed implementation plan:

Or check the PRD for original requirements.


Ready to ship Imperial-class workflow orchestration. 😈🚀

"All I am surrounded by is fear… and dead queues." — ValkyrAI to n8n