High-Concurrency Tuning

Operations guide for monitoring, tuning, and troubleshooting high-concurrency protoLabs deployments. See High-Concurrency Deployment for initial setup.

Monitoring & Observability

Real-Time Resource Monitoring

bash

# Container resource usage
docker stats automaker-server-staging --no-stream

# Memory details
docker exec automaker-server-staging free -h

# Process list with memory
docker exec automaker-server-staging ps aux --sort=-%mem | head -20

Agent Status Monitoring

bash

# Running agents
curl http://localhost:3008/api/agents/running \
  -H "X-API-Key: $AUTOMAKER_API_KEY"

# Board summary
curl http://localhost:3008/api/board/summary?projectPath=/projects/myproject \
  -H "X-API-Key: $AUTOMAKER_API_KEY"

# Auto-mode status
curl http://localhost:3008/api/auto-mode/status?projectPath=/projects/myproject \
  -H "X-API-Key: $AUTOMAKER_API_KEY"

Health Checks

bash

# API health
curl -f http://localhost:3008/api/health || echo "UNHEALTHY"

# Container health
docker inspect automaker-server-staging --format '{{.State.Health.Status}}'

# Health check logs
docker inspect automaker-server-staging --format '{{json .State.Health}}' | jq

Log Analysis

bash

# Follow logs with timestamp
docker compose -f docker-compose.staging.yml logs -f --timestamps

# Search for errors
docker compose -f docker-compose.staging.yml logs server | grep -i error

# Agent-specific logs
docker compose -f docker-compose.staging.yml logs server | grep "Agent\|feature"

# Memory warnings
docker compose -f docker-compose.staging.yml logs server | grep -i "memory\|heap"

Performance Tuning

Memory Pressure Indicators

Watch for these signs in logs:

[WARN] High memory usage: 42GB / 48GB (87%)
[ERROR] JavaScript heap out of memory
[WARN] Garbage collection taking longer than expected

If you see these, reduce maxConcurrency.

CPU Bottlenecks

bash

# Check CPU usage
docker stats automaker-server-staging --no-stream

# If CPU consistently > 80%
# - Reduce maxConcurrency
# - Upgrade to more CPU cores
# - Check for runaway agents (infinite loops)

Disk I/O

bash

# Disk usage
docker exec automaker-server-staging df -h

# If high I/O wait:
# - Use SSD storage
# - Check for excessive logging
# - Review worktree cleanup

Scaling

Vertical scaling (current approach): increase resources on single host.

48GB, 64GB, 96GB RAM
8, 12, 16 CPU cores
Adjust maxConcurrency proportionally

Rule of thumb: 6GB RAM per concurrent agent (mixed complexity).

Troubleshooting

Server Crashes Under Load

Symptoms: Container exits with code 137 (OOM kill), logs show "Killed" or heap errors, agents stuck in "running" state.

Solutions:

Check memory usage: docker stats automaker-server-staging
Reduce concurrency in settings
Increase memory limit in docker-compose
Review feature complexity distribution

Slow Agent Performance

Symptoms: Agents take much longer than expected, high CPU but low memory, logs show "waiting for API response".

Solutions:

Check network latency to Anthropic API
Review MCP tool usage (heavy Discord queries can slow agents)
Check for context file bloat (> 100KB files slow agents)

Docker Volume Issues

Symptoms: "No space left on device", slow file operations.

Solutions:

Check volume usage: docker system df -v
Clean up: docker system prune -a --volumes
Review data directory size

Environment Variables

Adding a New Env Var

Add to .env at /home/deploy/staging/automaker/.env
Add to docker-compose.staging.yml environment section
Recreate the container (restart does NOT re-read .env):

bash

cd /home/deploy/staging/automaker
docker stop automaker-server && docker rm automaker-server
docker compose -p automaker-staging -f docker-compose.staging.yml up -d server --no-deps

API Key Authentication

protoLabs uses X-API-Key header (NOT Authorization: Bearer):

bash

curl http://localhost:3008/api/health -H "X-API-Key: $AUTOMAKER_API_KEY"

Automated Deploys

Staging auto-deploys from main via a GitHub Actions self-hosted runner.

How It Works

Code merges to main
deploy-staging.yml triggers on the self-hosted runner
Workflow uses persistent deploy directory (/home/deploy/staging/automaker)
Drain step gracefully stops agents (up to 2 min timeout)
Rollback images tagged before build
setup-staging.sh --build builds and --start restarts
Health check + smoke tests verify the deploy
Rollback on failure, Discord notification posted

Zero-Downtime Deploy (Agent Drain)

push to main > drain API > agents finish/stop > build > restart > auto-resume

Drain endpoint: POST /api/deploy/drainStatus endpoint: GET /api/deploy/status

bash

# Manual drain
./scripts/setup-staging.sh --drain

# Via curl
curl -X POST http://localhost:3008/api/deploy/drain \
  -H "Content-Type: application/json" \
  -H "X-API-Key: $AUTOMAKER_API_KEY"

Manual Deploy

bash

git pull origin main
./scripts/setup-staging.sh --build
./scripts/setup-staging.sh --start

CD Pipeline Troubleshooting

Workspace Deleted Mid-Build

Cause: Runner cleanup cron deletes _work/ during builds. Fix: Deploy workflow uses persistent directory, env vars sourced via set -a.

Docker Build Fails on `build:packages`

Cause: packages/ not in Docker build context. Fix: Dockerfile uses npm run build:libs (not build:packages).

Staging Hardening

Nginx: gzip, security headers, proxy buffers, static asset caching
Rate limiting: 300 req/min general, 20 req/15min auth
WebSocket: exponential backoff reconnect, shutdown broadcast
Smoke tests: API health, features, UI, docs, WebSocket auth
Deploy safety: disk pre-check, rollback images, crash loop protection (on-failure:5)

High-Concurrency Tuning ​

Monitoring & Observability ​

Real-Time Resource Monitoring ​

Agent Status Monitoring ​

Health Checks ​

Log Analysis ​

Performance Tuning ​

Memory Pressure Indicators ​

CPU Bottlenecks ​

Disk I/O ​

Scaling ​

Troubleshooting ​

Server Crashes Under Load ​

Slow Agent Performance ​

Docker Volume Issues ​

Environment Variables ​

Adding a New Env Var ​

API Key Authentication ​

Automated Deploys ​

How It Works ​

Zero-Downtime Deploy (Agent Drain) ​

Manual Deploy ​

CD Pipeline Troubleshooting ​

Workspace Deleted Mid-Build ​

Docker Build Fails on build:packages ​

Staging Hardening ​

High-Concurrency Tuning

Monitoring & Observability

Real-Time Resource Monitoring

Agent Status Monitoring

Health Checks

Log Analysis

Performance Tuning

Memory Pressure Indicators

CPU Bottlenecks

Disk I/O

Scaling

Troubleshooting

Server Crashes Under Load

Slow Agent Performance

Docker Volume Issues

Environment Variables

Adding a New Env Var

API Key Authentication

Automated Deploys

How It Works

Zero-Downtime Deploy (Agent Drain)

Manual Deploy

CD Pipeline Troubleshooting

Workspace Deleted Mid-Build

Docker Build Fails on `build:packages`

Staging Hardening