edge-data-center-main-OPEN/core-docs-for-app-portability-across-primary-standby-machines.md

396 lines
6.1 KiB
Markdown
Raw Normal View History

2026-06-06 22:36:52 +00:00
Those four documents are essentially the minimum viable operational memory for an application.
They are what prevent:
“How did I set this up again?”
“What breaks if this VM dies?”
“How do I rebuild this?”
“What exactly do I back up?”
“How do I restore fast?”
This becomes critically important in your architecture because:
you are modular
you are self-hosted
you are intentionally avoiding giant SaaS abstractions
you want rebuildability
you want warm failover
you want ephemeral dev environments
Without operational docs, infrastructure slowly becomes tribal knowledge trapped in your head.
That does not scale even for one person over time.
The Four Core Docs
Think of them as:
Document Purpose
setup.md How to build the app from scratch
deploy.md How code moves into production
backup.md What must be preserved
restore.md How to recover from disaster
1. /docs/setup.md
This is:
“How do I create this app/server from zero?”
If the VM vanished tomorrow:
how do you rebuild it?
This doc should assume:
blank Ubuntu install
no memory
no assumptions
What Goes Inside
Purpose of the app
Example:
LOD API backend for customer management system.
Runs FastAPI with PostgreSQL backend.
VM specs
Example:
Ubuntu 24.04
2 CPU
4GB RAM
50GB disk
Required software
Example:
Python 3.12
PostgreSQL 16
Nginx
Git
Install steps
Example:
sudo apt update
sudo apt install python3.12 python3-venv git
Repo cloning
git clone <account-email>:yourorg/lod-api.git
Environment variables
Example:
DATABASE_URL=
API_KEY=
SECRET_KEY=
Never store secrets themselves in Git.
Just document them.
Directory structure
Example:
<app-install-path>
<app-data-root>/lod
<app-log-path>
systemd service
Example:
/etc/systemd/system/lod-api.service
And include:
full service file
restart instructions
Reverse proxy config
Example:
Caddy route:
lod.example.com -> <private-ip>:8000
Validation checklist
Example:
- API reachable
- DB connected
- Logs functional
- Backups running
Why setup.md Is Critical
Because eventually:
you WILL forget details
Ubuntu versions WILL change
dependencies WILL drift
a VM WILL die
you WILL rebuild something after months
This document becomes:
your infrastructure memory
your reproducibility layer
2. /docs/deploy.md
This is:
“How do changes safely move to production?”
This is operational workflow.
What Goes Inside
Branch strategy
Example:
main = production
dev = active development
Deployment flow
Example:
Dev VM -> Git push -> Production git pull
Production deployment steps
Example:
cd <app-install-path>
git pull
sudo systemctl restart lod-api
Pre-deploy checklist
Example:
- DB migrations tested
- API endpoints verified
- Backups confirmed
Rollback process
CRITICAL.
Example:
git checkout previous-tag
sudo systemctl restart lod-api
Version tagging
Example:
git tag v0.4.2
git push origin --tags
Downtime expectations
Example:
Expected restart interruption: 5-10 seconds
Why deploy.md Matters
Because deployment failures are where most operational stress happens.
This doc prevents:
forgotten steps
risky deployments
panic during rollback
“what changed?”
3. /docs/backup.md
This is:
“What data matters and how is it protected?”
Many people back up the wrong things.
You need to know:
what is replaceable
what is irreplaceable
What Goes Inside
What needs backup
Example:
PostgreSQL database
Uploaded files
.env file
SSL certs
NOT:
node_modules
Python cache
temporary containers
Backup frequency
Example:
Database:
- nightly full dump
- hourly WAL archive
Backup locations
Example:
Primary NAS
Secondary NAS
Offsite encrypted copy
Retention policy
Example:
Daily: 14 days
Weekly: 8 weeks
Monthly: 12 months
Backup commands
Example:
pg_dump lod > <database-backup-file>
Validation process
VERY important.
Example:
Monthly restore test required.
Backups that are never tested are fake backups.
Why backup.md Matters
Because during crisis:
you do not want to THINK.
You want:
exact commands
exact locations
exact priorities
4. /docs/restore.md
This is the most important doc of all.
This is:
“The server is dead. Now what?”
This document should let:
future-you
tired-you
stressed-you
restore service rapidly.
What Goes Inside
Failure scenarios
Example:
- VM corruption
- accidental deletion
- disk failure
- ransomware
- bad deployment
Recovery priority
Example:
1. Restore database
2. Restore uploads
3. Restore API service
4. Re-enable proxy routing
Restore procedure
Example:
createdb lod
psql lod < <database-backup-file>
DNS / routing changes
Example:
Update Caddy upstream IP if failover activated.
Validation after restore
Example:
- login works
- uploads visible
- API healthy
- monitoring active
Estimated recovery time
Example:
Expected restore:
15-30 minutes
Why restore.md Is The Most Important
Because backups are useless without restore procedures.
Most organizations discover this too late.
You are designing toward:
rapid rebuildability
warm failover
infrastructure resilience
This document becomes foundational.
The Hidden Benefit
These docs also become:
AI-readable operational intelligence
This is extremely important for your future direction.
Your future agents can:
read setup docs
analyze deployment flows
validate configs
detect drift
automate rebuilds
generate infrastructure maps
Because your infrastructure knowledge becomes structured.
You are effectively building:
operational memory
machine-readable infrastructure cognition
What I Would Add For You
You may also eventually want:
File Purpose
architecture.md High-level design and dependencies
network.md Ports, DNS, routing, Tailscale
security.md Auth, firewall, secrets handling
monitoring.md Metrics/logging/alerts
dependencies.md External systems and APIs
dr.md Full disaster recovery strategy
The Most Important Principle
These docs should allow you to:
Rebuild the app from scratch without relying on memory.
That is the gold standard.
If future-you can:
rebuild
restore
redeploy
fail over
using only the repo and docs,
then your infrastructure is becoming professionally mature and operationally resilient.