# Operations (Ops)

This directory contains all infrastructure-as-code, deployment automation, and monitoring configuration.

## Structure

```
ops/
├── terraform/            # Cloud infrastructure definitions
│   ├── modules/
│   ├── environments/
│   │   ├── staging/
│   │   └── production/
│   └── global/
├── ansible/              # Server provisioning and configuration
│   ├── playbooks/
│   ├── roles/
│   └── inventory/
└── monitoring/           # Observability stack
    ├── prometheus/
    ├── grafana/
    ├── loki/
    └── alertmanager/
```

## Terraform

Defines the cloud infrastructure on the chosen provider (Hetzner, AWS, or DigitalOcean recommended for cost efficiency).

**Resources**:
- Kubernetes cluster or Docker Swarm hosts
- PostgreSQL managed database (or self-hosted)
- TimescaleDB instance
- RabbitMQ / Redis managed service
- Object storage (S3-compatible) for backups and kit assets
- Load balancers and DNS records
- VPN / WireGuard for secure bridge-to-cloud communication

## Ansible

Playbooks for:
- Installing Docker and dependencies on bare metal
- Configuring infrastructure nodes (Raspberry Pi OS setup, bridge daemon deployment)
- Rotating TLS certificates
- Security hardening (fail2ban, firewall rules)

## Monitoring

Stack: Prometheus + Grafana + Loki + Alertmanager

**Metrics**:
- Node uptime and health
- Message throughput (inbound/outbound)
- API request rates and error rates
- Database performance
- Bridge daemon connectivity

**Alerts**:
- Node offline > 6 hours
- Bridge daemon disconnected > 15 minutes
- API error rate > 1%
- Disk space > 85%
- Subscription payment failures spike