396 lines
6.1 KiB
Markdown
396 lines
6.1 KiB
Markdown
Those four documents are essentially the minimum viable operational memory for an application.
|
|
|
|
They are what prevent:
|
|
|
|
“How did I set this up again?â€
|
|
“What breaks if this VM dies?â€
|
|
“How do I rebuild this?â€
|
|
“What exactly do I back up?â€
|
|
“How do I restore fast?â€
|
|
|
|
This becomes critically important in your architecture because:
|
|
|
|
you are modular
|
|
you are self-hosted
|
|
you are intentionally avoiding giant SaaS abstractions
|
|
you want rebuildability
|
|
you want warm failover
|
|
you want ephemeral dev environments
|
|
|
|
Without operational docs, infrastructure slowly becomes tribal knowledge trapped in your head.
|
|
|
|
That does not scale even for one person over time.
|
|
|
|
The Four Core Docs
|
|
|
|
Think of them as:
|
|
|
|
Document Purpose
|
|
setup.md How to build the app from scratch
|
|
deploy.md How code moves into production
|
|
backup.md What must be preserved
|
|
restore.md How to recover from disaster
|
|
1. /docs/setup.md
|
|
|
|
This is:
|
|
|
|
“How do I create this app/server from zero?â€
|
|
|
|
If the VM vanished tomorrow:
|
|
|
|
how do you rebuild it?
|
|
|
|
This doc should assume:
|
|
|
|
blank Ubuntu install
|
|
no memory
|
|
no assumptions
|
|
What Goes Inside
|
|
Purpose of the app
|
|
|
|
Example:
|
|
|
|
LOD API backend for customer management system.
|
|
Runs FastAPI with PostgreSQL backend.
|
|
VM specs
|
|
|
|
Example:
|
|
|
|
Ubuntu 24.04
|
|
2 CPU
|
|
4GB RAM
|
|
50GB disk
|
|
Required software
|
|
|
|
Example:
|
|
|
|
Python 3.12
|
|
PostgreSQL 16
|
|
Nginx
|
|
Git
|
|
Install steps
|
|
|
|
Example:
|
|
|
|
sudo apt update
|
|
sudo apt install python3.12 python3-venv git
|
|
Repo cloning
|
|
git clone <account-email>:yourorg/lod-api.git
|
|
Environment variables
|
|
|
|
Example:
|
|
|
|
DATABASE_URL=
|
|
API_KEY=
|
|
SECRET_KEY=
|
|
|
|
Never store secrets themselves in Git.
|
|
Just document them.
|
|
|
|
Directory structure
|
|
|
|
Example:
|
|
|
|
<app-install-path>
|
|
<app-data-root>/lod
|
|
<app-log-path>
|
|
systemd service
|
|
|
|
Example:
|
|
|
|
/etc/systemd/system/lod-api.service
|
|
|
|
And include:
|
|
|
|
full service file
|
|
restart instructions
|
|
Reverse proxy config
|
|
|
|
Example:
|
|
|
|
Caddy route:
|
|
lod.example.com -> <private-ip>:8000
|
|
Validation checklist
|
|
|
|
Example:
|
|
|
|
- API reachable
|
|
- DB connected
|
|
- Logs functional
|
|
- Backups running
|
|
Why setup.md Is Critical
|
|
|
|
Because eventually:
|
|
|
|
you WILL forget details
|
|
Ubuntu versions WILL change
|
|
dependencies WILL drift
|
|
a VM WILL die
|
|
you WILL rebuild something after months
|
|
|
|
This document becomes:
|
|
|
|
your infrastructure memory
|
|
your reproducibility layer
|
|
2. /docs/deploy.md
|
|
|
|
This is:
|
|
|
|
“How do changes safely move to production?â€
|
|
|
|
This is operational workflow.
|
|
|
|
What Goes Inside
|
|
Branch strategy
|
|
|
|
Example:
|
|
|
|
main = production
|
|
dev = active development
|
|
Deployment flow
|
|
|
|
Example:
|
|
|
|
Dev VM -> Git push -> Production git pull
|
|
Production deployment steps
|
|
|
|
Example:
|
|
|
|
cd <app-install-path>
|
|
git pull
|
|
sudo systemctl restart lod-api
|
|
Pre-deploy checklist
|
|
|
|
Example:
|
|
|
|
- DB migrations tested
|
|
- API endpoints verified
|
|
- Backups confirmed
|
|
Rollback process
|
|
|
|
CRITICAL.
|
|
|
|
Example:
|
|
|
|
git checkout previous-tag
|
|
sudo systemctl restart lod-api
|
|
Version tagging
|
|
|
|
Example:
|
|
|
|
git tag v0.4.2
|
|
git push origin --tags
|
|
Downtime expectations
|
|
|
|
Example:
|
|
|
|
Expected restart interruption: 5-10 seconds
|
|
Why deploy.md Matters
|
|
|
|
Because deployment failures are where most operational stress happens.
|
|
|
|
This doc prevents:
|
|
|
|
forgotten steps
|
|
risky deployments
|
|
panic during rollback
|
|
“what changed?â€
|
|
3. /docs/backup.md
|
|
|
|
This is:
|
|
|
|
“What data matters and how is it protected?â€
|
|
|
|
Many people back up the wrong things.
|
|
|
|
You need to know:
|
|
|
|
what is replaceable
|
|
what is irreplaceable
|
|
What Goes Inside
|
|
What needs backup
|
|
|
|
Example:
|
|
|
|
PostgreSQL database
|
|
Uploaded files
|
|
.env file
|
|
SSL certs
|
|
|
|
NOT:
|
|
|
|
node_modules
|
|
Python cache
|
|
temporary containers
|
|
Backup frequency
|
|
|
|
Example:
|
|
|
|
Database:
|
|
- nightly full dump
|
|
- hourly WAL archive
|
|
Backup locations
|
|
|
|
Example:
|
|
|
|
Primary NAS
|
|
Secondary NAS
|
|
Offsite encrypted copy
|
|
Retention policy
|
|
|
|
Example:
|
|
|
|
Daily: 14 days
|
|
Weekly: 8 weeks
|
|
Monthly: 12 months
|
|
Backup commands
|
|
|
|
Example:
|
|
|
|
pg_dump lod > <database-backup-file>
|
|
Validation process
|
|
|
|
VERY important.
|
|
|
|
Example:
|
|
|
|
Monthly restore test required.
|
|
|
|
Backups that are never tested are fake backups.
|
|
|
|
Why backup.md Matters
|
|
|
|
Because during crisis:
|
|
you do not want to THINK.
|
|
|
|
You want:
|
|
|
|
exact commands
|
|
exact locations
|
|
exact priorities
|
|
4. /docs/restore.md
|
|
|
|
This is the most important doc of all.
|
|
|
|
This is:
|
|
|
|
“The server is dead. Now what?â€
|
|
|
|
This document should let:
|
|
|
|
future-you
|
|
tired-you
|
|
stressed-you
|
|
|
|
restore service rapidly.
|
|
|
|
What Goes Inside
|
|
Failure scenarios
|
|
|
|
Example:
|
|
|
|
- VM corruption
|
|
- accidental deletion
|
|
- disk failure
|
|
- ransomware
|
|
- bad deployment
|
|
Recovery priority
|
|
|
|
Example:
|
|
|
|
1. Restore database
|
|
2. Restore uploads
|
|
3. Restore API service
|
|
4. Re-enable proxy routing
|
|
Restore procedure
|
|
|
|
Example:
|
|
|
|
createdb lod
|
|
psql lod < <database-backup-file>
|
|
DNS / routing changes
|
|
|
|
Example:
|
|
|
|
Update Caddy upstream IP if failover activated.
|
|
Validation after restore
|
|
|
|
Example:
|
|
|
|
- login works
|
|
- uploads visible
|
|
- API healthy
|
|
- monitoring active
|
|
Estimated recovery time
|
|
|
|
Example:
|
|
|
|
Expected restore:
|
|
15-30 minutes
|
|
Why restore.md Is The Most Important
|
|
|
|
Because backups are useless without restore procedures.
|
|
|
|
Most organizations discover this too late.
|
|
|
|
You are designing toward:
|
|
|
|
rapid rebuildability
|
|
warm failover
|
|
infrastructure resilience
|
|
|
|
This document becomes foundational.
|
|
|
|
The Hidden Benefit
|
|
|
|
These docs also become:
|
|
|
|
AI-readable operational intelligence
|
|
|
|
This is extremely important for your future direction.
|
|
|
|
Your future agents can:
|
|
|
|
read setup docs
|
|
analyze deployment flows
|
|
validate configs
|
|
detect drift
|
|
automate rebuilds
|
|
generate infrastructure maps
|
|
|
|
Because your infrastructure knowledge becomes structured.
|
|
|
|
You are effectively building:
|
|
|
|
operational memory
|
|
machine-readable infrastructure cognition
|
|
What I Would Add For You
|
|
|
|
You may also eventually want:
|
|
|
|
File Purpose
|
|
architecture.md High-level design and dependencies
|
|
network.md Ports, DNS, routing, Tailscale
|
|
security.md Auth, firewall, secrets handling
|
|
monitoring.md Metrics/logging/alerts
|
|
dependencies.md External systems and APIs
|
|
dr.md Full disaster recovery strategy
|
|
The Most Important Principle
|
|
|
|
These docs should allow you to:
|
|
|
|
Rebuild the app from scratch without relying on memory.
|
|
|
|
That is the gold standard.
|
|
|
|
If future-you can:
|
|
|
|
rebuild
|
|
restore
|
|
redeploy
|
|
fail over
|
|
|
|
using only the repo and docs,
|
|
|
|
then your infrastructure is becoming professionally mature and operationally resilient.
|