6.1 KiB
Those four documents are essentially the minimum viable operational memory for an application.
They are what prevent:
“How did I set this up again?†“What breaks if this VM dies?†“How do I rebuild this?†“What exactly do I back up?†“How do I restore fast?â€
This becomes critically important in your architecture because:
you are modular you are self-hosted you are intentionally avoiding giant SaaS abstractions you want rebuildability you want warm failover you want ephemeral dev environments
Without operational docs, infrastructure slowly becomes tribal knowledge trapped in your head.
That does not scale even for one person over time.
The Four Core Docs
Think of them as:
Document Purpose setup.md How to build the app from scratch deploy.md How code moves into production backup.md What must be preserved restore.md How to recover from disaster
- /docs/setup.md
This is:
“How do I create this app/server from zero?â€
If the VM vanished tomorrow:
how do you rebuild it?
This doc should assume:
blank Ubuntu install no memory no assumptions What Goes Inside Purpose of the app
Example:
LOD API backend for customer management system. Runs FastAPI with PostgreSQL backend. VM specs
Example:
Ubuntu 24.04 2 CPU 4GB RAM 50GB disk Required software
Example:
Python 3.12 PostgreSQL 16 Nginx Git Install steps
Example:
sudo apt update sudo apt install python3.12 python3-venv git Repo cloning git clone :yourorg/lod-api.git Environment variables
Example:
DATABASE_URL= API_KEY= SECRET_KEY=
Never store secrets themselves in Git. Just document them.
Directory structure
Example:
/lod systemd serviceExample:
/etc/systemd/system/lod-api.service
And include:
full service file restart instructions Reverse proxy config
Example:
Caddy route: lod.example.com -> :8000 Validation checklist
Example:
- API reachable
- DB connected
- Logs functional
- Backups running Why setup.md Is Critical
Because eventually:
you WILL forget details Ubuntu versions WILL change dependencies WILL drift a VM WILL die you WILL rebuild something after months
This document becomes:
your infrastructure memory your reproducibility layer 2. /docs/deploy.md
This is:
“How do changes safely move to production?â€
This is operational workflow.
What Goes Inside Branch strategy
Example:
main = production dev = active development Deployment flow
Example:
Dev VM -> Git push -> Production git pull Production deployment steps
Example:
cd git pull sudo systemctl restart lod-api Pre-deploy checklist
Example:
- DB migrations tested
- API endpoints verified
- Backups confirmed Rollback process
CRITICAL.
Example:
git checkout previous-tag sudo systemctl restart lod-api Version tagging
Example:
git tag v0.4.2 git push origin --tags Downtime expectations
Example:
Expected restart interruption: 5-10 seconds Why deploy.md Matters
Because deployment failures are where most operational stress happens.
This doc prevents:
forgotten steps risky deployments panic during rollback “what changed?†3. /docs/backup.md
This is:
“What data matters and how is it protected?â€
Many people back up the wrong things.
You need to know:
what is replaceable what is irreplaceable What Goes Inside What needs backup
Example:
PostgreSQL database Uploaded files .env file SSL certs
NOT:
node_modules Python cache temporary containers Backup frequency
Example:
Database:
- nightly full dump
- hourly WAL archive Backup locations
Example:
Primary NAS Secondary NAS Offsite encrypted copy Retention policy
Example:
Daily: 14 days Weekly: 8 weeks Monthly: 12 months Backup commands
Example:
pg_dump lod > Validation process
VERY important.
Example:
Monthly restore test required.
Backups that are never tested are fake backups.
Why backup.md Matters
Because during crisis: you do not want to THINK.
You want:
exact commands exact locations exact priorities 4. /docs/restore.md
This is the most important doc of all.
This is:
“The server is dead. Now what?â€
This document should let:
future-you tired-you stressed-you
restore service rapidly.
What Goes Inside Failure scenarios
Example:
- VM corruption
- accidental deletion
- disk failure
- ransomware
- bad deployment Recovery priority
Example:
- Restore database
- Restore uploads
- Restore API service
- Re-enable proxy routing Restore procedure
Example:
createdb lod psql lod < DNS / routing changes
Example:
Update Caddy upstream IP if failover activated. Validation after restore
Example:
- login works
- uploads visible
- API healthy
- monitoring active Estimated recovery time
Example:
Expected restore: 15-30 minutes Why restore.md Is The Most Important
Because backups are useless without restore procedures.
Most organizations discover this too late.
You are designing toward:
rapid rebuildability warm failover infrastructure resilience
This document becomes foundational.
The Hidden Benefit
These docs also become:
AI-readable operational intelligence
This is extremely important for your future direction.
Your future agents can:
read setup docs analyze deployment flows validate configs detect drift automate rebuilds generate infrastructure maps
Because your infrastructure knowledge becomes structured.
You are effectively building:
operational memory machine-readable infrastructure cognition What I Would Add For You
You may also eventually want:
File Purpose architecture.md High-level design and dependencies network.md Ports, DNS, routing, Tailscale security.md Auth, firewall, secrets handling monitoring.md Metrics/logging/alerts dependencies.md External systems and APIs dr.md Full disaster recovery strategy The Most Important Principle
These docs should allow you to:
Rebuild the app from scratch without relying on memory.
That is the gold standard.
If future-you can:
rebuild restore redeploy fail over
using only the repo and docs,
then your infrastructure is becoming professionally mature and operationally resilient.