Why Sisyphus

Coolify, CapRover, and Dokku give you push-to-deploy on your own server. Sisyphus does that too — but it also watches what it deployed.

An AI-powered SRE agent monitors every service, diagnoses failures with the Anthropic API, and remediates automatically. External HTTP probes check every site every 60 seconds. Three-tier alert escalation pages you when something is actually wrong — not when a container restarts.

No Kubernetes. No cluster. No control plane. One VPS, Docker, Caddy, and an agent that knows when to restart a service and when to roll back a deploy.

Under 16 EUR/month for the entire stack.

What it does

Hosts static sites, containerized apps, and AI-powered products on a single VPS. Auto-provisioned TLS, centralized authentication, file-based multi-agent coordination, and structured SLO enforcement.

Currently runs 12 sites across 4 domains with 99.5% availability target, automated backups, and a 24-role review panel for continuous improvement.

Deploy a project

Three steps to go from repo to live site:

  1. Add a CLAUDE.md to your repo with the project tech contract and PM role section.
  2. Add Dockerfile + validate.sh — the Dockerfile builds your app, validate.sh checks it before deploy.
  3. Push to main — CI/CD triggers the deploy webhook, the SRE agent pulls, builds, and verifies. Post-deploy health check confirms the site is live.

Static sites are even simpler: push HTML, the webhook pulls it, Caddy serves it. No build step.

Architecture

                     Internet
                        |
               [ Caddy v2 — auto-HTTPS ]
               /    |    |    |    \
           static  static  app   app   static
           sites   sites   apps  apps  sites
               \    |    |    |    /
               [ Docker — container runtime ]
                        |
          +———————+———————+———————+
          |             |             |             |
    [ Consul ]    [ CrowdSec ]  [ SRE Agent ]  [ Headscale ]
    svc disc.     threat intel  monitor+heal   zero-trust VPN
                                     |
                              [ Claude API ]
                              AI diagnosis
                                     |
                           +—————+—————+
                           |                 |
                      [ Telegram ]    [ Incident API ]
                      operator         project owners
                      alerts           self-service

Infrastructure

OSFlatcar Container Linux
RuntimeDocker
ProxyCaddy v2 (ACME TLS)
VPNHeadscale (WireGuard)
ProvisioningBash + Butane/Ignition

Operations

MonitoringSRE Agent (Python)
AIClaude API (capped)
SecurityCrowdSec + ACME
Backupsrestic (encrypted)
Secretsenv vars (not in git)

Deploy flow

Static sites (CI-built)

push to main GitHub Actions builds gh-pages branch VPS pulls + serves

Static sites (local-build)

build locally deploy.sh (rsync) webhook validates HMAC VPS unpacks + serves

App projects

push to main CI tests deploy webhook VPS builds image health check 60s
unhealthy? auto-rollback to previous SHA

Self-healing

The SRE agent polls every 60 seconds. It detects anomalies, remediates automatically, verifies the fix, and notifies the project owner. Claude API provides AI root-cause analysis on correlated failures (budget-capped).

poll (60s) detect anomaly remediate verify notify owner

Automated remediations

TriggerActionMode
Container unhealthy >2minRestart (3/hr limit)observe
Deploy unhealthyRollback to previous SHAobserve
Disk >85%Prune images + containersobserve
Disk >90%Full cleanup + log rotationdiagnose
Memory >90%Restart heaviest containerauto
Correlated failuresClaude AI root-cause analysisdiagnose

Multi-agent bus

Three AI agents coordinate via file-based tickets. Roles define safety boundaries (what you must not do unsupervised), not work boundaries. Sensing, reading, and QA flow freely to whoever has context.

Operational gravity

Free layer — any role, no ticket needed
read any file· git log/diff· headless chromium· curl· decide (with docs)· write tickets
Hard boundaries — designated role + intent file required
operator
SSH · secrets · containers · VPS rm
PM
git push project · edit source · deploy.sh
architect
git push bus/ · edit protocol · edit personas

Hat-switch

When the round-trip cost of the chain exceeds the safety value of separation, an agent switches hats — a first-class protocol operation, not a violation.

agent wearing hat A intent file
scope + boundaries
work as hat B intent file (return)
self-authorized if no hard boundary crossed user override if hard boundary

“Roles define safety boundaries, not work boundaries. The function with senses absorbs the function without them.” — Decision record, after Only-Ops: Operational Gravity in the AI Era

SLOs

Service levels

Availability99.5% / 24h
Latency p95<500ms
Container health100%
TLS expiry>7 days
Backupsdaily

Resolution targets

P015 min
P11 hour
P2half day
P31 day
P41 week

Cost

Hetzner CX22 (2 vCPU, 4 GB)€4.00
Encrypted backup storage€3.50
Domains~€3.00
AI diagnosis API€5.00 max
Total€15.50/mo