Workflow Monitor Deployment Guide
Complete guide for deploying the Workflow Monitor in production environments.
Prerequisites
Backend Requirements
- Java: 17 or higher
- Spring Boot: 3.x
- Maven: 3.8+
- WebSocket Support: Spring WebSocket + STOMP
- Memory: Minimum 2GB RAM (4GB recommended)
- CPU: 2 cores minimum (4 cores recommended)
Frontend Requirements
- Node.js: 18.x or higher
- npm: 9.x or higher
- React: 18.x
- TypeScript: 5.x
- Build Tool: webpack/vite
Infrastructure
- Load Balancer: Must support WebSocket (sticky sessions)
- Reverse Proxy: nginx/Apache with WebSocket support
- Firewall: Allow WebSocket connections (ports 80/443)
- SSL/TLS: Required for production (wss://)
Backend Deployment
1. Maven Dependencies
Add to pom.xml:
<dependencies>
<!-- Spring WebSocket -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-websocket</artifactId>
</dependency>
<!-- STOMP -->
<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring-messaging</artifactId>
</dependency>
<!-- SockJS (optional fallback) -->
<dependency>
<groupId>org.webjars</groupId>
<artifactId>sockjs-client</artifactId>
<version>1.5.1</version>
</dependency>
</dependencies>
2. Application Properties
application.yml:
spring:
websocket:
message:
max-text-size: 524288 # 512KB
max-binary-size: 524288
broker:
relay:
enabled: false # Use simple broker for small deployments
simple:
enabled: true
server:
port: 8080
tomcat:
threads:
max: 200 # Increase for high concurrency
min-spare: 10
connection-timeout: 20000 # 20 seconds
logging:
level:
org.springframework.messaging: DEBUG
org.springframework.web.socket: DEBUG
3. WebSocket Configuration
WebSocketConfig.java:
@Configuration
@EnableWebSocketMessageBroker
public class WebSocketConfig implements WebSocketMessageBrokerConfigurer {
@Override
public void configureMessageBroker(MessageBrokerRegistry registry) {
registry.enableSimpleBroker("/topic")
.setHeartbeatValue(new long[]{10000, 10000})
.setTaskScheduler(heartBeatScheduler());
registry.setApplicationDestinationPrefixes("/app");
registry.setPreservePublishOrder(true);
}
@Override
public void registerStompEndpoints(StompEndpointRegistry registry) {
registry.addEndpoint("/ws")
.setAllowedOriginPatterns(getAllowedOrigins())
.withSockJS()
.setClientLibraryUrl("https://cdn.jsdelivr.net/npm/sockjs-client@1.5.1/dist/sockjs.min.js");
}
@Override
public void configureWebSocketTransport(WebSocketTransportRegistration registration) {
registration
.setMessageSizeLimit(512 * 1024) // 512KB
.setSendTimeLimit(20000) // 20 seconds
.setSendBufferSizeLimit(3 * 512 * 1024); // 1.5MB
}
@Override
public void configureClientInboundChannel(ChannelRegistration registration) {
registration.taskExecutor()
.corePoolSize(4)
.maxPoolSize(8)
.queueCapacity(1000);
}
@Override
public void configureClientOutboundChannel(ChannelRegistration registration) {
registration.taskExecutor()
.corePoolSize(4)
.maxPoolSize(8)
.queueCapacity(1000);
}
@Bean
public TaskScheduler heartBeatScheduler() {
ThreadPoolTaskScheduler scheduler = new ThreadPoolTaskScheduler();
scheduler.setPoolSize(2);
scheduler.setThreadNamePrefix("ws-heartbeat-");
scheduler.initialize();
return scheduler;
}
private String[] getAllowedOrigins() {
String origins = System.getenv("ALLOWED_ORIGINS");
return origins != null ? origins.split(",") : new String[]{"*"};
}
}
4. Build & Package
# Build with Maven
mvn clean package -DskipTests
# Build with tests
mvn clean package
# Create Docker image (optional)
docker build -t valkyrai:latest .
5. Environment Variables
export SPRING_PROFILES_ACTIVE=production
export ALLOWED_ORIGINS=https://valkyr.ai,https://app.valkyr.ai
export SERVER_PORT=8080
export JAVA_OPTS="-Xms2g -Xmx4g -XX:+UseG1GC"
Frontend Deployment
1. Install Dependencies
cd web/typescript/valkyr_labs_com
npm install --production
2. Environment Configuration
.env.production:
REACT_APP_API_URL=https://api.valkyr.ai
REACT_APP_WS_URL=wss://api.valkyr.ai/ws
REACT_APP_ENV=production
3. Build
# Production build
npm run build
# Analyze bundle size
npm run build --report
4. Optimize Bundle
webpack.config.js (or vite.config.ts):
module.exports = {
optimization: {
splitChunks: {
chunks: 'all',
cacheGroups: {
vendor: {
test: /[\\/]node_modules[\\/]/,
name: 'vendors',
priority: 10,
},
stomp: {
test: /[\\/]node_modules[\\/]@stomp[\\/]/,
name: 'stomp',
priority: 20,
},
},
},
},
};
Load Balancer Configuration
nginx (Recommended)
nginx.conf:
upstream backend {
# Sticky sessions for WebSocket
ip_hash;
server backend1:8080 max_fails=3 fail_timeout=30s;
server backend2:8080 max_fails=3 fail_timeout=30s;
}
server {
listen 443 ssl http2;
server_name api.valkyr.ai;
ssl_certificate /etc/ssl/certs/valkyr.ai.crt;
ssl_certificate_key /etc/ssl/private/valkyr.ai.key;
ssl_protocols TLSv1.2 TLSv1.3;
# WebSocket upgrade
location /ws {
proxy_pass http://backend;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# WebSocket timeouts
proxy_read_timeout 86400s;
proxy_send_timeout 86400s;
proxy_connect_timeout 10s;
# Buffering (disable for WebSocket)
proxy_buffering off;
}
# REST API
location /api {
proxy_pass http://backend;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
}
AWS Application Load Balancer
TargetGroup:
Protocol: HTTP
Port: 8080
HealthCheck:
Path: /actuator/health
Interval: 30
Timeout: 5
HealthyThreshold: 2
UnhealthyThreshold: 3
TargetGroupAttributes:
- Key: stickiness.enabled
Value: true
- Key: stickiness.type
Value: lb_cookie
- Key: stickiness.lb_cookie.duration_seconds
Value: 86400
Listener:
Protocol: HTTPS
Port: 443
DefaultActions:
- Type: forward
TargetGroupArn: !Ref TargetGroup
Docker Deployment
Backend Dockerfile
FROM openjdk:17-jdk-alpine
# Add app user
RUN addgroup -S spring && adduser -S spring -G spring
# Copy JAR
COPY target/valkyrai-*.jar app.jar
# Run as non-root
USER spring:spring
# Expose port
EXPOSE 8080
# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=40s --retries=3 \
CMD curl -f http://localhost:8080/actuator/health || exit 1
# Run
ENTRYPOINT ["java", "-jar", "/app.jar"]
Frontend Dockerfile
FROM node:18-alpine AS build
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npm run build
FROM nginx:alpine
COPY --from=build /app/build /usr/share/nginx/html
COPY nginx.conf /etc/nginx/conf.d/default.conf
EXPOSE 80
CMD ["nginx", "-g", "daemon off;"]
Docker Compose
version: '3.8'
services:
backend:
image: valkyrai:latest
ports:
- "8080:8080"
environment:
- SPRING_PROFILES_ACTIVE=production
- ALLOWED_ORIGINS=https://valkyr.ai
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/actuator/health"]
interval: 30s
timeout: 3s
retries: 3
deploy:
replicas: 2
resources:
limits:
cpus: '2'
memory: 4G
reservations:
cpus: '1'
memory: 2G
frontend:
image: valkyrai-frontend:latest
ports:
- "80:80"
- "443:443"
depends_on:
- backend
volumes:
- ./ssl:/etc/nginx/ssl:ro
Kubernetes Deployment
Backend Deployment
backend-deployment.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: valkyrai-backend
spec:
replicas: 3
selector:
matchLabels:
app: valkyrai-backend
template:
metadata:
labels:
app: valkyrai-backend
spec:
containers:
- name: backend
image: valkyrai:latest
ports:
- containerPort: 8080
env:
- name: SPRING_PROFILES_ACTIVE
value: "production"
- name: ALLOWED_ORIGINS
valueFrom:
configMapKeyRef:
name: valkyrai-config
key: allowed-origins
resources:
requests:
memory: "2Gi"
cpu: "1"
limits:
memory: "4Gi"
cpu: "2"
livenessProbe:
httpGet:
path: /actuator/health/liveness
port: 8080
initialDelaySeconds: 60
periodSeconds: 10
readinessProbe:
httpGet:
path: /actuator/health/readiness
port: 8080
initialDelaySeconds: 30
periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
name: valkyrai-backend
spec:
type: ClusterIP
sessionAffinity: ClientIP # Sticky sessions for WebSocket
ports:
- port: 8080
targetPort: 8080
selector:
app: valkyrai-backend
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: valkyrai-ingress
annotations:
nginx.ingress.kubernetes.io/websocket-services: "valkyrai-backend"
nginx.ingress.kubernetes.io/proxy-read-timeout: "3600"
nginx.ingress.kubernetes.io/proxy-send-timeout: "3600"
nginx.ingress.kubernetes.io/upstream-hash-by: "$remote_addr" # Sticky sessions
spec:
rules:
- host: api.valkyr.ai
http:
paths:
- path: /ws
pathType: Prefix
backend:
service:
name: valkyrai-backend
port:
number: 8080
- path: /api
pathType: Prefix
backend:
service:
name: valkyrai-backend
port:
number: 8080
tls:
- hosts:
- api.valkyr.ai
secretName: valkyr-tls
Monitoring & Observability
Metrics
Spring Boot Actuator:
management:
endpoints:
web:
exposure:
include: health,metrics,prometheus
metrics:
tags:
application: valkyrai
export:
prometheus:
enabled: true
Prometheus Scrape Config:
scrape_configs:
- job_name: 'valkyrai'
metrics_path: '/actuator/prometheus'
static_configs:
- targets: ['backend:8080']
Logging
Structured Logging (JSON):
<configuration>
<appender name="CONSOLE" class="ch.qos.logback.core.ConsoleAppender">
<encoder class="net.logstash.logback.encoder.LogstashEncoder">
<includeMdcKeyName>traceId</includeMdcKeyName>
<includeMdcKeyName>spanId</includeMdcKeyName>
</encoder>
</appender>
<root level="INFO">
<appender-ref ref="CONSOLE" />
</root>
</configuration>
Distributed Tracing
Spring Cloud Sleuth:
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-sleuth</artifactId>
</dependency>
Performance Tuning
JVM Options
JAVA_OPTS="
-Xms2g -Xmx4g
-XX:+UseG1GC
-XX:MaxGCPauseMillis=200
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/var/log/valkyrai/
-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=9010
-Dcom.sun.management.jmxremote.authenticate=false
"
WebSocket Tuning
spring:
task:
scheduling:
pool:
size: 10
websocket:
message:
max-sessions: 1000
send-time-limit: 20000
send-buffer-size-limit: 1572864 # 1.5MB
Security Checklist
- ✅ Enable HTTPS/WSS in production
- ✅ Configure CORS properly
- ✅ Implement authentication on WebSocket connections
- ✅ Use JWT or session-based auth
- ✅ Rate limit WebSocket connections
- ✅ Validate all incoming messages
- ✅ Sanitize output to prevent XSS
- ✅ Use CSP headers
- ✅ Enable security headers (HSTS, etc.)
- ✅ Regularly update dependencies
Troubleshooting
WebSocket Connection Failures
# Test WebSocket endpoint
curl -i -N \
-H "Connection: Upgrade" \
-H "Upgrade: websocket" \
-H "Sec-WebSocket-Key: test" \
-H "Sec-WebSocket-Version: 13" \
https://api.valkyr.ai/ws
High Memory Usage
# Monitor JVM memory
jcmd <pid> VM.native_memory summary
# Heap dump
jmap -dump:format=b,file=heap.bin <pid>
Connection Leaks
-- Monitor active WebSocket sessions (PostgreSQL)
SELECT COUNT(*) FROM pg_stat_activity
WHERE application_name LIKE '%WebSocket%';
Rollback Procedure
-
Stop new traffic:
kubectl scale deployment valkyrai-backend --replicas=0 -
Deploy previous version:
kubectl set image deployment/valkyrai-backend \
backend=valkyrai:previous-version -
Verify health:
kubectl get pods -l app=valkyrai-backend -
Resume traffic:
kubectl scale deployment valkyrai-backend --replicas=3
Support & Resources
- GitHub Issues: https://github.com/valkyrlabs/valkyrai/issues
- Documentation: https://docs.valkyr.ai
- Slack Community: https://valkyr.slack.com
- Email Support: support@valkyr.ai
Last Updated: October 6, 2025 Version: 1.0.0