Why Sisyphus
Coolify, CapRover, and Dokku give you push-to-deploy on your own server. Sisyphus does that too — but it also watches what it deployed.
An AI-powered SRE agent monitors every service, diagnoses failures with the Anthropic API, and remediates automatically. External HTTP probes check every site every 60 seconds. Three-tier alert escalation pages you when something is actually wrong — not when a container restarts.
No Kubernetes. No cluster. No control plane. One VPS, Docker, Caddy, and an agent that knows when to restart a service and when to roll back a deploy.
Under 16 EUR/month for the entire stack.
What it does
Hosts static sites, containerized apps, and AI-powered products on a single VPS. Auto-provisioned TLS, centralized authentication, file-based multi-agent coordination, and structured SLO enforcement.
Currently runs 12 sites across 4 domains with 99.5% availability target, automated backups, and a 24-role review panel for continuous improvement.
Deploy a project
Three steps to go from repo to live site:
- Add a
CLAUDE.mdto your repo with the project tech contract and PM role section. - Add
Dockerfile+validate.sh— the Dockerfile builds your app, validate.sh checks it before deploy. - Push to main — CI/CD triggers the deploy webhook, the SRE agent pulls, builds, and verifies. Post-deploy health check confirms the site is live.
Static sites are even simpler: push HTML, the webhook pulls it, Caddy serves it. No build step.
Architecture
Internet
|
[ Caddy v2 — auto-HTTPS ]
/ | | | \
static static app app static
sites sites apps apps sites
\ | | | /
[ Docker — container runtime ]
|
+———————+———————+———————+
| | | |
[ Consul ] [ CrowdSec ] [ SRE Agent ] [ Headscale ]
svc disc. threat intel monitor+heal zero-trust VPN
|
[ Claude API ]
AI diagnosis
|
+—————+—————+
| |
[ Telegram ] [ Incident API ]
operator project owners
alerts self-service
Infrastructure
Operations
Deploy flow
Static sites (CI-built)
Static sites (local-build)
App projects
Self-healing
The SRE agent polls every 60 seconds. It detects anomalies, remediates automatically, verifies the fix, and notifies the project owner. Claude API provides AI root-cause analysis on correlated failures (budget-capped).
Automated remediations
| Trigger | Action | Mode |
|---|---|---|
| Container unhealthy >2min | Restart (3/hr limit) | observe |
| Deploy unhealthy | Rollback to previous SHA | observe |
| Disk >85% | Prune images + containers | observe |
| Disk >90% | Full cleanup + log rotation | diagnose |
| Memory >90% | Restart heaviest container | auto |
| Correlated failures | Claude AI root-cause analysis | diagnose |
Multi-agent bus
Three AI agents coordinate via file-based tickets. Roles define safety boundaries (what you must not do unsupervised), not work boundaries. Sensing, reading, and QA flow freely to whoever has context.
Operational gravity
SSH · secrets · containers · VPS rm PM
git push project · edit source · deploy.sh architect
git push bus/ · edit protocol · edit personas
Hat-switch
When the round-trip cost of the chain exceeds the safety value of separation, an agent switches hats — a first-class protocol operation, not a violation.
scope + boundaries→ work as hat B→ intent file (return)
“Roles define safety boundaries, not work boundaries. The function with senses absorbs the function without them.” — Decision record, after Only-Ops: Operational Gravity in the AI Era
SLOs
Service levels
Resolution targets
Cost
| Hetzner CX22 (2 vCPU, 4 GB) | €4.00 |
| Encrypted backup storage | €3.50 |
| Domains | ~€3.00 |
| AI diagnosis API | €5.00 max |
| Total | €15.50/mo |