Includes: - Backend services: ingestion (:8001), weather API (:8002), gateway (:8003), billing (:8004) with BTCPay integration - Shared asyncpg pool, TimescaleDB hypertable, Redis, Mosquitto MQTT - React frontend: Dashboard (MapLibre) and Messaging (chat UI) - Bridge daemon for Pi + Meshtastic (Serial/TCP T-Deck support) - Production Docker Compose, Nginx reverse proxy, ops scripts - DEPLOY.md with step-by-step deployment guide
64 lines
1.7 KiB
Markdown
64 lines
1.7 KiB
Markdown
# Operations (Ops)
|
|
|
|
This directory contains all infrastructure-as-code, deployment automation, and monitoring configuration.
|
|
|
|
## Structure
|
|
|
|
```
|
|
ops/
|
|
├── terraform/ # Cloud infrastructure definitions
|
|
│ ├── modules/
|
|
│ ├── environments/
|
|
│ │ ├── staging/
|
|
│ │ └── production/
|
|
│ └── global/
|
|
├── ansible/ # Server provisioning and configuration
|
|
│ ├── playbooks/
|
|
│ ├── roles/
|
|
│ └── inventory/
|
|
└── monitoring/ # Observability stack
|
|
├── prometheus/
|
|
├── grafana/
|
|
├── loki/
|
|
└── alertmanager/
|
|
```
|
|
|
|
## Terraform
|
|
|
|
Defines the cloud infrastructure on the chosen provider (Hetzner, AWS, or DigitalOcean recommended for cost efficiency).
|
|
|
|
**Resources**:
|
|
- Kubernetes cluster or Docker Swarm hosts
|
|
- PostgreSQL managed database (or self-hosted)
|
|
- TimescaleDB instance
|
|
- RabbitMQ / Redis managed service
|
|
- Object storage (S3-compatible) for backups and kit assets
|
|
- Load balancers and DNS records
|
|
- VPN / WireGuard for secure bridge-to-cloud communication
|
|
|
|
## Ansible
|
|
|
|
Playbooks for:
|
|
- Installing Docker and dependencies on bare metal
|
|
- Configuring infrastructure nodes (Raspberry Pi OS setup, bridge daemon deployment)
|
|
- Rotating TLS certificates
|
|
- Security hardening (fail2ban, firewall rules)
|
|
|
|
## Monitoring
|
|
|
|
Stack: Prometheus + Grafana + Loki + Alertmanager
|
|
|
|
**Metrics**:
|
|
- Node uptime and health
|
|
- Message throughput (inbound/outbound)
|
|
- API request rates and error rates
|
|
- Database performance
|
|
- Bridge daemon connectivity
|
|
|
|
**Alerts**:
|
|
- Node offline > 6 hours
|
|
- Bridge daemon disconnected > 15 minutes
|
|
- API error rate > 1%
|
|
- Disk space > 85%
|
|
- Subscription payment failures spike
|