Skip to main content

Workflow Monitor Deployment Guide

Complete guide for deploying the Workflow Monitor in production environments.


Prerequisites

Backend Requirements

  • Java: 17 or higher
  • Spring Boot: 3.x
  • Maven: 3.8+
  • WebSocket Support: Spring WebSocket + STOMP
  • Memory: Minimum 2GB RAM (4GB recommended)
  • CPU: 2 cores minimum (4 cores recommended)

Frontend Requirements

  • Node.js: 18.x or higher
  • npm: 9.x or higher
  • React: 18.x
  • TypeScript: 5.x
  • Build Tool: webpack/vite

Infrastructure

  • Load Balancer: Must support WebSocket (sticky sessions)
  • Reverse Proxy: nginx/Apache with WebSocket support
  • Firewall: Allow WebSocket connections (ports 80/443)
  • SSL/TLS: Required for production (wss://)

Backend Deployment

1. Maven Dependencies

Add to pom.xml:

<dependencies>
<!-- Spring WebSocket -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-websocket</artifactId>
</dependency>

<!-- STOMP -->
<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring-messaging</artifactId>
</dependency>

<!-- SockJS (optional fallback) -->
<dependency>
<groupId>org.webjars</groupId>
<artifactId>sockjs-client</artifactId>
<version>1.5.1</version>
</dependency>
</dependencies>

2. Application Properties

application.yml:

spring:
websocket:
message:
max-text-size: 524288 # 512KB
max-binary-size: 524288
broker:
relay:
enabled: false # Use simple broker for small deployments
simple:
enabled: true

server:
port: 8080
tomcat:
threads:
max: 200 # Increase for high concurrency
min-spare: 10
connection-timeout: 20000 # 20 seconds

logging:
level:
org.springframework.messaging: DEBUG
org.springframework.web.socket: DEBUG

3. WebSocket Configuration

WebSocketConfig.java:

@Configuration
@EnableWebSocketMessageBroker
public class WebSocketConfig implements WebSocketMessageBrokerConfigurer {

@Override
public void configureMessageBroker(MessageBrokerRegistry registry) {
registry.enableSimpleBroker("/topic")
.setHeartbeatValue(new long[]{10000, 10000})
.setTaskScheduler(heartBeatScheduler());
registry.setApplicationDestinationPrefixes("/app");
registry.setPreservePublishOrder(true);
}

@Override
public void registerStompEndpoints(StompEndpointRegistry registry) {
registry.addEndpoint("/ws")
.setAllowedOriginPatterns(getAllowedOrigins())
.withSockJS()
.setClientLibraryUrl("https://cdn.jsdelivr.net/npm/sockjs-client@1.5.1/dist/sockjs.min.js");
}

@Override
public void configureWebSocketTransport(WebSocketTransportRegistration registration) {
registration
.setMessageSizeLimit(512 * 1024) // 512KB
.setSendTimeLimit(20000) // 20 seconds
.setSendBufferSizeLimit(3 * 512 * 1024); // 1.5MB
}

@Override
public void configureClientInboundChannel(ChannelRegistration registration) {
registration.taskExecutor()
.corePoolSize(4)
.maxPoolSize(8)
.queueCapacity(1000);
}

@Override
public void configureClientOutboundChannel(ChannelRegistration registration) {
registration.taskExecutor()
.corePoolSize(4)
.maxPoolSize(8)
.queueCapacity(1000);
}

@Bean
public TaskScheduler heartBeatScheduler() {
ThreadPoolTaskScheduler scheduler = new ThreadPoolTaskScheduler();
scheduler.setPoolSize(2);
scheduler.setThreadNamePrefix("ws-heartbeat-");
scheduler.initialize();
return scheduler;
}

private String[] getAllowedOrigins() {
String origins = System.getenv("ALLOWED_ORIGINS");
return origins != null ? origins.split(",") : new String[]{"*"};
}
}

4. Build & Package

# Build with Maven
mvn clean package -DskipTests

# Build with tests
mvn clean package

# Create Docker image (optional)
docker build -t valkyrai:latest .

5. Environment Variables

export SPRING_PROFILES_ACTIVE=production
export ALLOWED_ORIGINS=https://valkyr.ai,https://app.valkyr.ai
export SERVER_PORT=8080
export JAVA_OPTS="-Xms2g -Xmx4g -XX:+UseG1GC"

Frontend Deployment

1. Install Dependencies

cd web/typescript/valkyr_labs_com
npm install --production

2. Environment Configuration

.env.production:

REACT_APP_API_URL=https://api.valkyr.ai
REACT_APP_WS_URL=wss://api.valkyr.ai/ws
REACT_APP_ENV=production

3. Build

# Production build
npm run build

# Analyze bundle size
npm run build --report

4. Optimize Bundle

webpack.config.js (or vite.config.ts):

module.exports = {
optimization: {
splitChunks: {
chunks: 'all',
cacheGroups: {
vendor: {
test: /[\\/]node_modules[\\/]/,
name: 'vendors',
priority: 10,
},
stomp: {
test: /[\\/]node_modules[\\/]@stomp[\\/]/,
name: 'stomp',
priority: 20,
},
},
},
},
};

Load Balancer Configuration

nginx.conf:

upstream backend {
# Sticky sessions for WebSocket
ip_hash;
server backend1:8080 max_fails=3 fail_timeout=30s;
server backend2:8080 max_fails=3 fail_timeout=30s;
}

server {
listen 443 ssl http2;
server_name api.valkyr.ai;

ssl_certificate /etc/ssl/certs/valkyr.ai.crt;
ssl_certificate_key /etc/ssl/private/valkyr.ai.key;
ssl_protocols TLSv1.2 TLSv1.3;

# WebSocket upgrade
location /ws {
proxy_pass http://backend;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;

# WebSocket timeouts
proxy_read_timeout 86400s;
proxy_send_timeout 86400s;
proxy_connect_timeout 10s;

# Buffering (disable for WebSocket)
proxy_buffering off;
}

# REST API
location /api {
proxy_pass http://backend;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
}

AWS Application Load Balancer

TargetGroup:
Protocol: HTTP
Port: 8080
HealthCheck:
Path: /actuator/health
Interval: 30
Timeout: 5
HealthyThreshold: 2
UnhealthyThreshold: 3
TargetGroupAttributes:
- Key: stickiness.enabled
Value: true
- Key: stickiness.type
Value: lb_cookie
- Key: stickiness.lb_cookie.duration_seconds
Value: 86400

Listener:
Protocol: HTTPS
Port: 443
DefaultActions:
- Type: forward
TargetGroupArn: !Ref TargetGroup

Docker Deployment

Backend Dockerfile

FROM openjdk:17-jdk-alpine

# Add app user
RUN addgroup -S spring && adduser -S spring -G spring

# Copy JAR
COPY target/valkyrai-*.jar app.jar

# Run as non-root
USER spring:spring

# Expose port
EXPOSE 8080

# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=40s --retries=3 \
CMD curl -f http://localhost:8080/actuator/health || exit 1

# Run
ENTRYPOINT ["java", "-jar", "/app.jar"]

Frontend Dockerfile

FROM node:18-alpine AS build

WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npm run build

FROM nginx:alpine
COPY --from=build /app/build /usr/share/nginx/html
COPY nginx.conf /etc/nginx/conf.d/default.conf
EXPOSE 80
CMD ["nginx", "-g", "daemon off;"]

Docker Compose

version: '3.8'

services:
backend:
image: valkyrai:latest
ports:
- "8080:8080"
environment:
- SPRING_PROFILES_ACTIVE=production
- ALLOWED_ORIGINS=https://valkyr.ai
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/actuator/health"]
interval: 30s
timeout: 3s
retries: 3
deploy:
replicas: 2
resources:
limits:
cpus: '2'
memory: 4G
reservations:
cpus: '1'
memory: 2G

frontend:
image: valkyrai-frontend:latest
ports:
- "80:80"
- "443:443"
depends_on:
- backend
volumes:
- ./ssl:/etc/nginx/ssl:ro

Kubernetes Deployment

Backend Deployment

backend-deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
name: valkyrai-backend
spec:
replicas: 3
selector:
matchLabels:
app: valkyrai-backend
template:
metadata:
labels:
app: valkyrai-backend
spec:
containers:
- name: backend
image: valkyrai:latest
ports:
- containerPort: 8080
env:
- name: SPRING_PROFILES_ACTIVE
value: "production"
- name: ALLOWED_ORIGINS
valueFrom:
configMapKeyRef:
name: valkyrai-config
key: allowed-origins
resources:
requests:
memory: "2Gi"
cpu: "1"
limits:
memory: "4Gi"
cpu: "2"
livenessProbe:
httpGet:
path: /actuator/health/liveness
port: 8080
initialDelaySeconds: 60
periodSeconds: 10
readinessProbe:
httpGet:
path: /actuator/health/readiness
port: 8080
initialDelaySeconds: 30
periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
name: valkyrai-backend
spec:
type: ClusterIP
sessionAffinity: ClientIP # Sticky sessions for WebSocket
ports:
- port: 8080
targetPort: 8080
selector:
app: valkyrai-backend
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: valkyrai-ingress
annotations:
nginx.ingress.kubernetes.io/websocket-services: "valkyrai-backend"
nginx.ingress.kubernetes.io/proxy-read-timeout: "3600"
nginx.ingress.kubernetes.io/proxy-send-timeout: "3600"
nginx.ingress.kubernetes.io/upstream-hash-by: "$remote_addr" # Sticky sessions
spec:
rules:
- host: api.valkyr.ai
http:
paths:
- path: /ws
pathType: Prefix
backend:
service:
name: valkyrai-backend
port:
number: 8080
- path: /api
pathType: Prefix
backend:
service:
name: valkyrai-backend
port:
number: 8080
tls:
- hosts:
- api.valkyr.ai
secretName: valkyr-tls

Monitoring & Observability

Metrics

Spring Boot Actuator:

management:
endpoints:
web:
exposure:
include: health,metrics,prometheus
metrics:
tags:
application: valkyrai
export:
prometheus:
enabled: true

Prometheus Scrape Config:

scrape_configs:
- job_name: 'valkyrai'
metrics_path: '/actuator/prometheus'
static_configs:
- targets: ['backend:8080']

Logging

Structured Logging (JSON):

<configuration>
<appender name="CONSOLE" class="ch.qos.logback.core.ConsoleAppender">
<encoder class="net.logstash.logback.encoder.LogstashEncoder">
<includeMdcKeyName>traceId</includeMdcKeyName>
<includeMdcKeyName>spanId</includeMdcKeyName>
</encoder>
</appender>

<root level="INFO">
<appender-ref ref="CONSOLE" />
</root>
</configuration>

Distributed Tracing

Spring Cloud Sleuth:

<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-sleuth</artifactId>
</dependency>

Performance Tuning

JVM Options

JAVA_OPTS="
-Xms2g -Xmx4g
-XX:+UseG1GC
-XX:MaxGCPauseMillis=200
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/var/log/valkyrai/
-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=9010
-Dcom.sun.management.jmxremote.authenticate=false
"

WebSocket Tuning

spring:
task:
scheduling:
pool:
size: 10
websocket:
message:
max-sessions: 1000
send-time-limit: 20000
send-buffer-size-limit: 1572864 # 1.5MB

Security Checklist

  • ✅ Enable HTTPS/WSS in production
  • ✅ Configure CORS properly
  • ✅ Implement authentication on WebSocket connections
  • ✅ Use JWT or session-based auth
  • ✅ Rate limit WebSocket connections
  • ✅ Validate all incoming messages
  • ✅ Sanitize output to prevent XSS
  • ✅ Use CSP headers
  • ✅ Enable security headers (HSTS, etc.)
  • ✅ Regularly update dependencies

Troubleshooting

WebSocket Connection Failures

# Test WebSocket endpoint
curl -i -N \
-H "Connection: Upgrade" \
-H "Upgrade: websocket" \
-H "Sec-WebSocket-Key: test" \
-H "Sec-WebSocket-Version: 13" \
https://api.valkyr.ai/ws

High Memory Usage

# Monitor JVM memory
jcmd <pid> VM.native_memory summary

# Heap dump
jmap -dump:format=b,file=heap.bin <pid>

Connection Leaks

-- Monitor active WebSocket sessions (PostgreSQL)
SELECT COUNT(*) FROM pg_stat_activity
WHERE application_name LIKE '%WebSocket%';

Rollback Procedure

  1. Stop new traffic:

    kubectl scale deployment valkyrai-backend --replicas=0
  2. Deploy previous version:

    kubectl set image deployment/valkyrai-backend \
    backend=valkyrai:previous-version
  3. Verify health:

    kubectl get pods -l app=valkyrai-backend
  4. Resume traffic:

    kubectl scale deployment valkyrai-backend --replicas=3

Support & Resources


Last Updated: October 6, 2025 Version: 1.0.0