edge-data-center-main-OPEN/architecture.md
eddiesoehnel e36b87addf added
2026-06-06 16:36:52 -06:00

211 lines
4.4 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# **Long-Term Tech Stack**
This is designed to be:
* **Simple**
* **Durable**
* **Modular**
* **Vendor-independent**
* **AI-native**
* **Low-maintenance**
* **Future-proof**
## Core Backend Layer (Stays Python Forever)
### *FastAPI* {#fastapi}
Your main backend framework indefinitely.
* Lightweight
* Fast
* Async
* Great docs
* Perfect for APIs
* Perfect for JSON-store backends
### *Pydantic (v2)* {#pydantic-(v2)}
For:
* Validation
* Schema definition
* Type enforcement
* Parsing & serializing JSON
This gives you the “AI-Native Data Object” foundation.
### *Python typing* {#python-typing}
Static typing \+ Pydantic gives you:
* Safety
* Predictable data models
* AI-generated UIs
* Clean code
### *JSON object datastore on disk* {#json-object-datastore-on-disk}
Your architecture:
`<app-data-root>/json-db/lod/customers/*.json`
`<app-data-root>/json-db/lod/files/`
`<app-data-root>/json-db/bc/hubs/*.json`
This becomes your **source of truth** — and it’s timeless:
* No SQL migration pain
* AI-friendly
* Human-readable
* Portable across servers
* Works with agents
* Simple backups
* Easy to replicate across clouds
### *Systemd services* {#systemd-services}
For:
* Backend uptime
* Simple restarts
* Zero Docker complexity
* Predictability
### *CAddy* {#caddy}
One global reverse proxy.
* SSL termination
* Routing
* Static file serving
* Stable
* Minimal overhead
This will last you a decade.
## Frontend Layer (Optional TS Later) {#frontend-layer-(optional-ts-later)}
For now:
* Simple JS
* Plain HTML \+ small modals
* Works fine for LOD
Long term (as projects expand):
* **React \+ TypeScript**
* **HTMX** for hybrid pages
* Shared component library
* Shared TypeScript types autogenerated from Pydantic
* Local static hosting (`<app-install-path>…`)
You only bring in TS when:
* The UIs become larger
* You need better autocompletion
* Indexing dashboards get complex
* You build cross-app shared components
There is *no immediate need*.
## AI Layer (Strong Long-Term Direction) {#ai-layer-(strong-long-term-direction)}
### *Python for AI operations* {#python-for-ai-operations}
Perfect for:
* Embeddings
* Chunking
* Vectorization
* STT
* LLM calls
* Agents
* Parsing inbound data
* Publishing pipelines
### *Agentic pipelines* {#agentic-pipelines-(future)}
* Chunking → JSONL files
* Vectorization (Embeddings) → Python
* OpenAI embeddings
* Jina embedding
* Cohere
* Local embeddings later (GGUF)
* SentenceTransformers
* Python is the correct home for:
* running the embedder
* transforming the JSONL chunks
* updating embeddings
* building vector stores
* MCP
* OpenAI’s new agent tools
* Event-driven systems
* Scheduled analytical tasks (weeklies)
* Lightweight Database for Metadata \+ Embeddings
* SQLite \+ DuckDB
* Qdrant
* LanceDB
* Weaviate (local mode)
* ChromaDB
* The simplest long-term option I recommend for you:
* DuckDB or SQLite for metadata
* LanceDB or Qdrant for vectors
* Why?
* Very fast
* No server needed
* Easy to copy/backup
* Python-native
* AI-friendly
* Perfect with JSONL chunk pipelines
* Your JSONL holds the raw chunks, Your small local DB holds:
* chunk\_id
* metadata (source, tags, time ranges)
* vector embeddings
* up-to-date indexes
* Rag Layer \- Python
* Consiuder agentic RAG sitting in this \- see index main product development document https://docs.example.invalid/private-reference
### *Python scripts & CLIs* {#python-scripts-&-clis}
For:
* Import/export
* Data normalization
* Periodic JSON cleanup
* Building indexes
* Summaries
* AI-native publishing
These will accumulate value into your system over years.
### *API-Driven* {#api-driven}
Even inside your system, set up so the pieces talk to each other via API, rather than tight coupling so that modules can be upgraded, replaced, outsourced, etc.
## Code
Stay Python-First For:
AI pipelines
agents
orchestration
APIs
content processing
ingestion
automation
research tooling
Add Go/Rust Selectively For:
high-performance services
distributed networking
edge infrastructure
heavy concurrency
secure execution sandboxes
streaming systems
future NetworkSIG infrastructure
low-memory edge compute