Disaster recovery: what to do when things go wrong
Something is already broken and you need to fix it. This page is the map of “what can go wrong” + “what still works when it does” + “how to get back from each situation.” If you’re reading this before an incident, the companion page is Disaster prevention — that’s where the off-laptop backups, off-site bucket, and proactive secret exports live.
The short version: your software suite is designed so that no single accidental click can lock you out. It takes a combination of events to lose access, and for every scenario there’s a recovery path.
Data-loss scenarios — quick FAQ
Section titled “Data-loss scenarios — quick FAQ”If you’re skimming for the one thing that matches your current situation, this is the index. Each entry links to the page or section that walks the recovery in detail.
| Situation | First move | Where to read more |
|---|---|---|
| I deleted a file by accident (one user, one file/folder) | Try the app’s own trash. If empty, contact your operator. | Trash first; if empty, contact your operator. |
| I lost my password or my 2FA (just me) | Self-service password reset; for 2FA, contact your operator. | Self-service reset via the portal; 2FA goes through your operator. |
| All admins are locked out at once (lost email, lost dashboard) | Contact your operator — they have a separate recovery path that doesn’t depend on email or the dashboard. | Out-of-band operator path (not client-side). |
| The whole VPS is encrypted by ransomware | Contact your operator immediately. Recovery is from your last clean backup, before the ransomware reached the disk. | Recovery map below (“Entire VPS disk”) |
| The VPS itself is compromised by malware / unauthorized access | Contact your operator. The path is wipe + restore from a pre-compromise snapshot + rotate every secret. Your operator owns this; you receive a status update at each phase. | Recovery map below (“Entire VPS disk”) |
| VPS provider’s datacenter burns down (or hardware failure) | Contact your operator. They restore your software suite to a fresh VPS at the same or different provider, using the off-site backup bucket. | Recovery map below (“Entire VPS disk”) + Restore to a fresh VPS |
| VPS provider gives 48h notice / suspends the account | Contact your operator. They migrate you to a new provider on a tight timeline; expect ~30-60 minutes of public-URL downtime during cutover. | Recovery map below (“VPS provider goes bankrupt”) |
| Backup provider gives 48h notice | Contact your operator. They re-target backups at a new bucket; existing data on the VPS is unaffected. | Recovery map below (“S3 backup provider goes bankrupt”) |
| I think someone else has my password / API token | Don’t wait — contact your operator and rotate the credential. | Recovery map below (per-credential rows) |
The recovery map below has the full table including infrastructure edges (Cloudflare token rotation, Tailscale account, etc.) — keep reading.
The one button you may need: “Export recovery secrets”
Section titled “The one button you may need: “Export recovery secrets””On actions.yourdomain.com, under the Ops section, there’s a button labelled “Export recovery secrets (encrypted)” (🔐). Click it when:
- You have lost the master password that decrypts your secret file.
- You can still log into this dashboard.
It will prompt you for a passphrase (type a strong one — your password manager can generate one), reconstruct an encrypted copy of every recoverable secret from the live VPS, and drop a file on the server that you can download with your browser.
To download the file — no SSH or command line needed:
-
Open recovery.yourdomain.com in your browser.
-
Sign in as an administrator (same account you used to click the button above). Non-admin team members see a deny page here, which is intentional — recovery bundles should only reach people who can already trigger a restore.
-
Click the newest
secrets-*.yml.gpgfile to save it to your laptop. Decrypt it locally with your passphrase:gpg --pinentry-mode loopback -o vault-recovered.yml -d secrets-*.yml.gpg
The file is useless without the passphrase — it’s safe to leave sitting on the VPS for a few days while you handle the rest of your recovery.
If your access to the dashboard is also broken, your operator can do the same thing remotely using the same tool over SSH.
The same button can be used before any incident, as a prevention step — see Disaster prevention.
What the button can and cannot recover
Section titled “What the button can and cannot recover”Everything that lives in a container’s environment or a file on disk is recoverable from a running server: database passwords, SSO keys, SMTP credentials, backup encryption passwords. Those make up the bulk of what you’d need to reassemble your secret file.
Three credentials cannot come from the server — they live in other companies’ admin consoles, not on your VPS:
- Tailscale OAuth client — regenerate at login.tailscale.com
- Cloudflare API token — regenerate at dash.cloudflare.com/profile/api-tokens
- Dokploy API key — regenerate in the Dokploy UI (Settings -> Profile -> API Keys)
The exported file lists these as clearly-marked placeholders so you know to re-mint them.
Recovery map — what breaks and what to do
Section titled “Recovery map — what breaks and what to do”| What you lose | What still works | How to recover |
|---|---|---|
| Master password (your laptop is fine) | Everything — your server, your apps, your SSH | Click the Export recovery secrets button above, save the file, restart your operator’s setup with the new file |
| Master password AND you locked yourself out of the dashboard | Your server, your apps, SSH | Your operator runs the same export tool over SSH and sends you the file |
| Laptop (with master password on it) | Your server, your apps, your backups | Your operator has a copy of the master password (if they saved it on hand-off) OR click the recovery button from any browser |
| SSH private key | Your server, your apps, the dashboard | Your operator re-adds a new public key via their own admin path; if they’re unavailable, see “Provider rescue mode” below |
| Dashboard access (SSO broken, Keycloak down) | Your apps (their own logins still work), your data | Operator SSHes in to fix; worst case, restart the Keycloak container |
| One app’s data (you deleted something) | Everything else | Try the app’s own trash first; if empty, contact your operator. |
| Entire VPS disk (corruption, accidental wipe) | Backups (in your S3 bucket) | Follow Restore to a fresh VPS with the same cloud provider |
| Cloudflare API token (accidentally rotated) | Your tunnel keeps running. Public apps stay up. Functionality only, not backup. | Generate a new API token at dash.cloudflare.com/profile/api-tokens, then contact your operator with it — they install the new token and confirm the tunnel still picks up DNS changes after rotation. Apps stay reachable while you wait. |
| Cloudflare tunnel token (rotated or leaked) | Existing tunnel keeps running until cloudflared next reconnects, then drops. Public apps go dark until rotation completes. Functionality, not backup. | This is more disruptive than the API token: public traffic stops when cloudflared can’t reauthenticate. Contact your operator immediately so they can mint and install the replacement. Find the token under dash.cloudflare.com -> your zone -> Zero Trust -> Networks -> Tunnels -> click your tunnel -> Configure -> reveal/rotate token. NOT in the “API Tokens” page. Expect 5-15 minutes of public-app downtime during install. |
| Tailscale OAuth client (accidentally rotated) | Your server’s tailnet access keeps working. Remote SSH stays up. | Generate a new OAuth client, update your secret file |
| Dokploy API key (accidentally rotated) | All your apps keep running | Generate a new key in Dokploy UI, update your secret file |
| Cloudflare account terminated | Your server, your apps (internally), your data | Create a new Cloudflare account, point your domain to it, re-run operator setup; your apps experience downtime only during DNS propagation |
| Tailscale account terminated | Your server, your apps, your public path (CF Tunnel) | Switch to a different ops-access method; Tailscale’s only the “admin back door,” not part of the public serving path |
| Your VPS provider goes bankrupt / shuts down | Your S3 backup bucket (different company) | Restore to a fresh VPS at a different provider using the backup |
| Your VPS provider’s datacenter burns down (OVH Strasbourg 2021) | Your S3 backup bucket (different region, different city) | Same as above — restore to a fresh VPS at the same or different provider, different region |
| Your S3 backup provider goes bankrupt / shuts down | Your VPS and its data | You still have the data — copy your production VPS to a new S3 bucket before the deadline the provider gives you. If you set up a secondary backup (see Prevention), it’s already safe |
| S3 bucket accidentally deleted | Your VPS and its data | Same — recreate the bucket and repoint backups. Some providers keep deleted objects for a retention window, which might buy you time |
| VPS provider AND S3 provider outage at the same time | Last weekly off-site copy (if you set one up — see Prevention) | Restore from the off-site copy to any fresh cloud |
| Master password AND SSH AND server dead | S3 backup bucket | You can still decrypt the restic repo if you saved the restic repo password AND the S3 access key + secret separately (see Prevention) — restic needs all three to read the bucket — follow Restore to a fresh VPS |
| Master password AND restic password AND server dead | Your S3 bucket exists but every byte in it is ciphertext you can’t open | Data loss. This is why Disaster prevention says to save the restic password separately, even from your master password |
Provider rescue mode — when you’ve lost SSH
Section titled “Provider rescue mode — when you’ve lost SSH”Every serious VPS provider offers a “rescue mode” that lets you boot a temporary rescue image with your existing disk mounted, so you can add a new SSH key or recover files without re-installing. A few examples:
- OVH: Control Panel -> your VPS -> Rescue / rescue-customer.
Reboot into rescue, mount your disk, add your new public key to
/home/ops/.ssh/authorized_keys, reboot normally. - Hetzner: Robot -> Rescue system -> enable and reboot.
- DigitalOcean / Linode / Vultr: each has a recovery console (sometimes a VNC web terminal) — look for “Recovery” / “Console” in the provider’s sidebar.
Your operator can walk you through this over a video call if needed; the steps are the same across providers, just different UI labels.
Operator SSH and provider rescue console are equivalent for your purposes. If your operator is reachable, they SSH in and fix things. If they’re not — or if SSH itself is broken — the provider’s rescue console gives you the same root-level access to the disk. Either path gets you back to a working server; the rescue console is just the fallback when the normal one is unavailable. Don’t burn time waiting on one if the other is in front of you.
How this plays out in practice
Section titled “How this plays out in practice”Most “I lost X” situations are nowhere near as bad as they feel in the first five minutes. The site keeps serving traffic. Your database is fine. You have 24-72 hours to handle the recovery without pressure — everything but total disaster is survivable on a weekday morning with a coffee.
If something in the map above doesn’t match your situation, call your operator. The whole point of the hand-off kit + this page is to give you every path we can, but nothing replaces a second pair of eyes in a real incident.