# Operations (Ops) This directory contains all infrastructure-as-code, deployment automation, and monitoring configuration. ## Structure ``` ops/ ├── terraform/ # Cloud infrastructure definitions │ ├── modules/ │ ├── environments/ │ │ ├── staging/ │ │ └── production/ │ └── global/ ├── ansible/ # Server provisioning and configuration │ ├── playbooks/ │ ├── roles/ │ └── inventory/ └── monitoring/ # Observability stack ├── prometheus/ ├── grafana/ ├── loki/ └── alertmanager/ ``` ## Terraform Defines the cloud infrastructure on the chosen provider (Hetzner, AWS, or DigitalOcean recommended for cost efficiency). **Resources**: - Kubernetes cluster or Docker Swarm hosts - PostgreSQL managed database (or self-hosted) - TimescaleDB instance - RabbitMQ / Redis managed service - Object storage (S3-compatible) for backups and kit assets - Load balancers and DNS records - VPN / WireGuard for secure bridge-to-cloud communication ## Ansible Playbooks for: - Installing Docker and dependencies on bare metal - Configuring infrastructure nodes (Raspberry Pi OS setup, bridge daemon deployment) - Rotating TLS certificates - Security hardening (fail2ban, firewall rules) ## Monitoring Stack: Prometheus + Grafana + Loki + Alertmanager **Metrics**: - Node uptime and health - Message throughput (inbound/outbound) - API request rates and error rates - Database performance - Bridge daemon connectivity **Alerts**: - Node offline > 6 hours - Bridge daemon disconnected > 15 minutes - API error rate > 1% - Disk space > 85% - Subscription payment failures spike