4.4 KiB
Long-Term Tech Stack
This is designed to be:
- Simple
- Durable
- Modular
- Vendor-independent
- AI-native
- Low-maintenance
- Future-proof
Core Backend Layer (Stays Python Forever)
FastAPI
Your main backend framework indefinitely.
- Lightweight
- Fast
- Async
- Great docs
- Perfect for APIs
- Perfect for JSON-store backends
Pydantic (v2) {#pydantic-(v2)}
For:
- Validation
- Schema definition
- Type enforcement
- Parsing & serializing JSON
This gives you the “AI-Native Data Object” foundation.
Python typing
Static typing + Pydantic gives you:
- Safety
- Predictable data models
- AI-generated UIs
- Clean code
JSON object datastore on disk
Your architecture:
/srv/data/json-db/lod/customers/*.json
/srv/data/json-db/lod/files/
/srv/data/json-db/bc/hubs/*.json
This becomes your source of truth — and it’s timeless:
- No SQL migration pain
- AI-friendly
- Human-readable
- Portable across servers
- Works with agents
- Simple backups
- Easy to replicate across clouds
Systemd services
For:
- Backend uptime
- Simple restarts
- Zero Docker complexity
- Predictability
CAddy
One global reverse proxy.
- SSL termination
- Routing
- Static file serving
- Stable
- Minimal overhead
This will last you a decade.
Frontend Layer (Optional TS Later) {#frontend-layer-(optional-ts-later)}
For now:
- Simple JS
- Plain HTML + small modals
- Works fine for LOD
Long term (as projects expand):
- React + TypeScript
- HTMX for hybrid pages
- Shared component library
- Shared TypeScript types autogenerated from Pydantic
- Local static hosting (
/srv/apps/ui/…)
You only bring in TS when:
- The UIs become larger
- You need better autocompletion
- Indexing dashboards get complex
- You build cross-app shared components
There is no immediate need.
AI Layer (Strong Long-Term Direction) {#ai-layer-(strong-long-term-direction)}
Python for AI operations
Perfect for:
- Embeddings
- Chunking
- Vectorization
- STT
- LLM calls
- Agents
- Parsing inbound data
- Publishing pipelines
Agentic pipelines {#agentic-pipelines-(future)}
- Chunking → JSONL files
- Vectorization (Embeddings) → Python
- OpenAI embeddings
- Jina embedding
- Cohere
- Local embeddings later (GGUF)
- SentenceTransformers
- Python is the correct home for:
- running the embedder
- transforming the JSONL chunks
- updating embeddings
- building vector stores
- MCP
- OpenAI’s new agent tools
- Event-driven systems
- Scheduled analytical tasks (weeklies)
- Lightweight Database for Metadata + Embeddings
- SQLite + DuckDB
- Qdrant
- LanceDB
- Weaviate (local mode)
- ChromaDB
- The simplest long-term option I recommend for you:
- DuckDB or SQLite for metadata
- LanceDB or Qdrant for vectors
- Why?
- Very fast
- No server needed
- Easy to copy/backup
- Python-native
- AI-friendly
- Perfect with JSONL chunk pipelines
- Your JSONL holds the raw chunks, Your small local DB holds:
- chunk_id
- metadata (source, tags, time ranges)
- vector embeddings
- up-to-date indexes
- Rag Layer - Python
- Consiuder agentic RAG sitting in this - see index main product development document https://docs.google.com/document/d/1GedfgKY78INGREJ5lgNSOv0OIAMjm0Lj268I40u0JuY/edit?tab=t.0#heading=h.ksb8pdnacvtr
Python scripts & CLIs {#python-scripts-&-clis}
For:
- Import/export
- Data normalization
- Periodic JSON cleanup
- Building indexes
- Summaries
- AI-native publishing
These will accumulate value into your system over years.
API-Driven
Even inside your system, set up so the pieces talk to each other via API, rather than tight coupling so that modules can be upgraded, replaced, outsourced, etc.
Code
Stay Python-First For:
AI pipelines agents orchestration APIs content processing ingestion automation research tooling
Add Go/Rust Selectively For:
high-performance services distributed networking edge infrastructure heavy concurrency secure execution sandboxes streaming systems future NetworkSIG infrastructure low-memory edge compute