feat: intent-aware querying + smart sampling for large audit datasets

- Add keyword-based intent extraction: 'device' → Intune, 'user' → Directory, etc. - Broad questions without intent auto-exclude noisy services (Exchange, SharePoint) - Smart stratified sampling: failures always included, high-value services prioritised - Fetch up to 1000 events from MongoDB, then curate best 200 for the LLM - Excluded services noted in LLM prompt and query_info so the admin knows the scope
ci: also tag and push 'latest' on every release
2026-04-20 17:41:21 +02:00 · 2026-04-20 17:31:27 +02:00 · 2026-04-20 17:29:10 +02:00 · 2026-04-20 17:24:20 +02:00 · 2026-04-20 17:15:55 +02:00 · 2026-04-20 17:12:43 +02:00
10 changed files with 305 additions and 17 deletions
--- a/.gitea/workflows/release.yml
+++ b/.gitea/workflows/release.yml
@@ -16,7 +16,13 @@ jobs:
        run: echo "${{ secrets.REGISTRY_TOKEN }}" | docker login git.cqre.net -u ${{ github.actor }} --password-stdin 2>&1 | grep -v "WARNING! Your credentials are stored unencrypted"

      - name: Build Docker image
-        run: docker build ./backend --tag git.cqre.net/cqrenet/aoc-backend:${{ gitea.ref_name }}
+        run: docker build ./backend --build-arg VERSION=${{ gitea.ref_name }} --tag git.cqre.net/cqrenet/aoc-backend:${{ gitea.ref_name }}

-      - name: Push Docker image
+      - name: Tag as latest
+        run: docker tag git.cqre.net/cqrenet/aoc-backend:${{ gitea.ref_name }} git.cqre.net/cqrenet/aoc-backend:latest
+
+      - name: Push version tag
        run: docker push git.cqre.net/cqrenet/aoc-backend:${{ gitea.ref_name }}
+
+      - name: Push latest tag
+        run: docker push git.cqre.net/cqrenet/aoc-backend:latest
--- a/RELEASE_NOTES_v1.2.5.md
+++ b/RELEASE_NOTES_v1.2.5.md
@@ -0,0 +1,78 @@
+# AOC v1.2.5 Release Notes
+
+**Release date:** 2026-04-20
+
+---
+
+## What's new
+
+### Natural language query (`/api/ask`)
+Ask questions in plain English and get AI-generated answers backed by your audit logs.
+
+- **Regex-based parsing** extracts time ranges (`last 3 days`, `yesterday`, `today`) and entities (`device ABC123`, `user bob@example.com`) without calling an LLM.
+- **AI narrative summarisation** via any OpenAI-compatible API (OpenAI, Azure OpenAI, MS Foundry, Ollama).
+- **Graceful fallback** when no LLM is configured — returns a structured bullet list with a clear error banner.
+- **Cited evidence** — every answer includes the raw events that back it up.
+
+### Filter-aware queries
+The ask endpoint now respects the filter panel. When you set **Service = Exchange**, **Result = failure** and ask *"What happened to device X?"*, the LLM only sees failed Exchange events for that device.
+
+### Scales to thousands of events
+For large result sets (>50 events), the LLM receives an **aggregated overview** instead of a raw dump:
+- Counts by service, action, result, and actor
+- Failure highlights
+- The 50 most recent raw events as samples
+
+This keeps token usage low while preserving accuracy.
+
+### Azure OpenAI / MS Foundry support
+- Automatic `api-key` header detection for Azure endpoints.
+- `LLM_API_VERSION` config for Azure `api-version` query parameters.
+- `max_completion_tokens` support for newer model deployments.
+
+### Version display
+- `GET /api/version` endpoint reads the `VERSION` file.
+- Frontend shows a version badge in the header (e.g., **1.2.5**).
+
+### Production hardening (from v1.1.0)
+- Dockerfile runs as non-root user with Gunicorn + Uvicorn workers.
+- `docker-compose.prod.yml` with internal-only MongoDB, health checks, and nginx reverse proxy.
+- Security headers (`X-Frame-Options`, `X-Content-Type-Options`, etc.).
+
+---
+
+## Configuration
+
+Add to your `.env`:
+
+```bash
+# Required for AI narrative summarisation
+LLM_API_KEY=your-key
+LLM_BASE_URL=https://api.openai.com/v1
+LLM_MODEL=gpt-4o-mini
+LLM_MAX_EVENTS=200
+LLM_TIMEOUT_SECONDS=30
+LLM_API_VERSION=                 # set for Azure OpenAI, e.g. 2024-12-01-preview
+```
+
+For Azure OpenAI / MS Foundry:
+```bash
+LLM_BASE_URL=https://your-resource.openai.azure.com/openai/deployments/your-deployment
+LLM_API_KEY=your-azure-key
+LLM_API_VERSION=2024-12-01-preview
+LLM_MODEL=your-deployment-name
+```
+
+---
+
+## Upgrade notes
+
+No breaking changes. Existing `/api/events`, filters, pagination, tags, and comments work unchanged.
+
+---
+
+## Docker image
+
+```
+git.cqre.net/cqrenet/aoc-backend:v1.2.5
+```
--- a/2
+++ b/2
@@ -1 +1 @@
-1.2.2
+1.2.7
--- a/backend/Dockerfile
+++ b/backend/Dockerfile
@@ -1,5 +1,9 @@
 FROM python:3.11-slim

+# Bake the version into the image at build time
+ARG VERSION=unknown
+ENV VERSION=${VERSION}
+
 # Security: run as non-root
 RUN groupadd -r aoc && useradd -r -g aoc aoc

--- a/backend/frontend/index.html
+++ b/backend/frontend/index.html
@@ -12,7 +12,7 @@
  <div class="page" x-data="aocApp()" x-init="initApp()">
    <header class="hero">
      <div>
-        <p class="eyebrow">Admin Operations Center</p>
+        <p class="eyebrow">Admin Operations Center <span class="version-badge" x-text="appVersion"></span></p>
        <h1>Directory Audit Explorer</h1>
        <p class="lede">Filter Microsoft Entra audit events by user, app, time, action, and action type.</p>
      </div>
@@ -243,6 +243,7 @@
          actor: '', selectedServices: [], search: '', operation: '', result: '', start: '', end: '', limit: 100, includeTags: '', excludeTags: '',
        },
        options: { actors: [], services: [], operations: [], results: [] },
+        appVersion: '',
        askQuestionText: '',
        askLoading: false,
        askAnswer: '',
@@ -252,6 +253,7 @@
        askLlmError: '',

        async initApp() {
+          await this.loadVersion();
          await this.initAuth();
          if (!this.authConfig?.auth_enabled || this.accessToken) {
            await this.loadFilterOptions();
@@ -260,6 +262,16 @@
          }
        },

+        async loadVersion() {
+          try {
+            const res = await fetch('/api/version');
+            if (res.ok) {
+              const body = await res.json();
+              this.appVersion = body.version || '';
+            }
+          } catch {}
+        },
+
        authHeader() {
          return this.accessToken ? { Authorization: `Bearer ${this.accessToken}` } : {};
        },
--- a/backend/frontend/style.css
+++ b/backend/frontend/style.css
@@ -433,6 +433,20 @@ input {
  color: var(--muted);
 }

+.version-badge {
+  display: inline-block;
+  margin-left: 8px;
+  padding: 2px 8px;
+  border-radius: 999px;
+  background: rgba(125, 211, 252, 0.15);
+  border: 1px solid rgba(125, 211, 252, 0.3);
+  color: var(--accent-strong);
+  font-size: 11px;
+  font-weight: 600;
+  letter-spacing: 0.05em;
+  vertical-align: middle;
+}
+
 .ask-events {
  margin-bottom: 14px;
 }
--- a/backend/main.py
+++ b/backend/main.py
@@ -134,6 +134,13 @@ async def metrics():
    return Response(content=prometheus_metrics(), media_type="text/plain")


+@app.get("/api/version")
+async def version():
+    import os
+
+    return {"version": os.environ.get("VERSION", "unknown")}
+
+
 frontend_dir = Path(__file__).parent / "frontend"
 app.mount("/", StaticFiles(directory=frontend_dir, html=True), name="frontend")

--- a/backend/routes/ask.py
+++ b/backend/routes/ask.py
@@ -13,6 +13,129 @@ from models.api import AskRequest, AskResponse
 router = APIRouter(dependencies=[Depends(require_auth)])
 logger = structlog.get_logger("aoc.ask")

+# ---------------------------------------------------------------------------
+# Intent extraction — map question keywords to relevant audit services
+# ---------------------------------------------------------------------------
+
+_SERVICE_INTENTS = {
+    "intune": ["Intune"],
+    "device": ["Intune", "Device"],
+    "laptop": ["Intune", "Device"],
+    "mobile": ["Intune", "Device"],
+    "phone": ["Intune", "Device"],
+    "ipad": ["Intune", "Device"],
+    "app": ["Intune", "ApplicationManagement"],
+    "application": ["Intune", "ApplicationManagement"],
+    "policy": ["Intune", "Policy"],
+    "compliance": ["Intune", "Policy"],
+    "user": ["Directory", "UserManagement"],
+    "group": ["Directory", "GroupManagement"],
+    "role": ["Directory", "RoleManagement"],
+    "permission": ["Directory", "RoleManagement"],
+    "license": ["Directory", "License"],
+    "email": ["Exchange"],
+    "mailbox": ["Exchange"],
+    "mail": ["Exchange"],
+    "message": ["Exchange", "Teams"],
+    "file": ["SharePoint"],
+    "sharepoint": ["SharePoint"],
+    "site": ["SharePoint"],
+    "document": ["SharePoint"],
+    "team": ["Teams"],
+    "channel": ["Teams"],
+    "meeting": ["Teams"],
+    "call": ["Teams"],
+}
+
+# Services that are extremely noisy for typical admin questions.
+# We exclude them by default on broad questions unless the user explicitly mentions them.
+_NOISY_SERVICES = {"Exchange", "SharePoint"}
+
+# Services that are generally admin-relevant and kept by default.
+_DEFAULT_ADMIN_SERVICES = {
+    "Directory",
+    "UserManagement",
+    "GroupManagement",
+    "RoleManagement",
+    "ApplicationManagement",
+    "Intune",
+    "Device",
+    "Policy",
+    "Teams",
+    "License",
+}
+
+
+def _extract_intent_services(question: str) -> tuple[list[str] | None, bool]:
+    """
+    Extract relevant services from the question.
+
+    Returns:
+        (services, is_explicit):
+        - services: list of service names to query, or None for default admin set
+        - is_explicit: True if the user explicitly mentioned a noisy service
+    """
+    q_lower = question.lower()
+    tokens = set(re.findall(r"\b[a-z]+\b", q_lower))
+
+    matched_services = set()
+    for token, services in _SERVICE_INTENTS.items():
+        if token in tokens:
+            matched_services.update(services)
+
+    if matched_services:
+        # User asked something specific — return exactly what they asked for
+        is_explicit = not matched_services.isdisjoint(_NOISY_SERVICES)
+        return sorted(matched_services), is_explicit
+
+    # Broad question with no clear intent — default to admin-relevant services only
+    return None, False
+
+
+# ---------------------------------------------------------------------------
+# Smart sampling — stratified by importance so the LLM sees signal, not noise
+# ---------------------------------------------------------------------------
+
+
+def _smart_sample(events: list[dict], max_events: int = 200) -> list[dict]:
+    """
+    Return a curated subset that preserves diversity and prioritises signal.
+
+    Tiers:
+      1. Failures (always valuable)
+      2. High-admin-value services (Intune, Device, Directory, etc.)
+      3. Everything else
+    """
+    if len(events) <= max_events:
+        return events
+
+    high_value = {
+        "Directory",
+        "UserManagement",
+        "GroupManagement",
+        "RoleManagement",
+        "Intune",
+        "Device",
+        "Policy",
+        "ApplicationManagement",
+    }
+
+    failures = [e for e in events if str(e.get("result") or "").lower() in ("failure", "failed")]
+    high_val = [e for e in events if e.get("service") in high_value and e not in failures]
+    rest = [e for e in events if e not in failures and e not in high_val]
+
+    # Allocate slots: half to failures+high-value, half to rest (but never let rest dominate)
+    slots = max_events
+    failure_cap = min(len(failures), max(10, slots // 4))
+    high_cap = min(len(high_val), max(20, slots // 4))
+    rest_cap = slots - failure_cap - high_cap
+
+    sampled = failures[:failure_cap] + high_val[:high_cap] + rest[:rest_cap]
+    # Sort back to chronological order
+    sampled.sort(key=lambda e: e.get("timestamp") or "", reverse=True)
+    return sampled
+
+
 # ---------------------------------------------------------------------------
 # Time-range extraction
 # ---------------------------------------------------------------------------
@@ -203,12 +326,16 @@ def _aggregate_counts(events: list[dict]) -> dict:
    }


-def _format_events_for_llm(events: list[dict], total: int | None = None) -> str:
+def _format_events_for_llm(
+    events: list[dict], total: int | None = None, excluded_services: list[str] | None = None
+) -> str:
    lines = []

    # If we have a large result set, send aggregation + samples instead of raw dump
    if total is not None and total > len(events) and len(events) >= 50:
-        lines.append(f"Result set overview: {total} total events (showing the {len(events)} most recent).\n")
+        lines.append(f"Result set overview: {total} total events (showing a curated sample of {len(events)}).\n")
+        if excluded_services:
+            lines.append(f"Note: high-volume services excluded by default: {', '.join(excluded_services)}.\n")
        agg = _aggregate_counts(events)
        lines.append("Breakdown by service:")
        for svc, cnt in agg["services"]:
@@ -267,11 +394,16 @@ def _build_chat_url(base_url: str, api_version: str) -> str:
    return url


-async def _call_llm(question: str, events: list[dict], total: int | None = None) -> str:
+async def _call_llm(
+    question: str,
+    events: list[dict],
+    total: int | None = None,
+    excluded_services: list[str] | None = None,
+) -> str:
    if not LLM_API_KEY:
        raise RuntimeError("LLM_API_KEY not configured")

-    context = _format_events_for_llm(events, total=total)
+    context = _format_events_for_llm(events, total=total, excluded_services=excluded_services)
    messages = [
        {"role": "system", "content": _SYSTEM_PROMPT},
        {
@@ -332,6 +464,7 @@ async def ask_question(body: AskRequest, user: dict = Depends(require_auth)):

    start, end = _extract_time_range(question)
    entity = _extract_entity(question)
+    intent_services, explicit_noisy = _extract_intent_services(question)

    # Default to last 7 days if no time range detected
    if not start:
@@ -339,11 +472,29 @@ async def ask_question(body: AskRequest, user: dict = Depends(require_auth)):
        start = (now - timedelta(days=7)).isoformat().replace("+00:00", "Z")
        end = now.isoformat().replace("+00:00", "Z")

+    # -----------------------------------------------------------------------
+    # Decide which services to query
+    # -----------------------------------------------------------------------
+    excluded_services: list[str] = []
+    if body.services:
+        # User explicitly filtered via UI — respect that exactly
+        query_services = body.services
+    elif intent_services is not None:
+        # NL question implies specific services
+        query_services = intent_services
+    else:
+        # Broad question with no intent — exclude noisy services by default
+        query_services = sorted(_DEFAULT_ADMIN_SERVICES)
+        excluded_services = sorted(_NOISY_SERVICES)
+
+    # -----------------------------------------------------------------------
+    # Build and run query
+    # -----------------------------------------------------------------------
    query = _build_event_query(
        entity,
        start,
        end,
-        services=body.services,
+        services=query_services,
        actor=body.actor,
        operation=body.operation,
        result=body.result,
@@ -353,21 +504,33 @@ async def ask_question(body: AskRequest, user: dict = Depends(require_auth)):

    try:
        total = events_collection.count_documents(query)
-        cursor = events_collection.find(query).sort([("timestamp", -1)]).limit(LLM_MAX_EVENTS)
-        events = list(cursor)
+        # Fetch a generous window so we can apply smart sampling in Python
+        cursor = events_collection.find(query).sort([("timestamp", -1)]).limit(1000)
+        raw_events = list(cursor)
    except Exception as exc:
        logger.error("Failed to query events for ask", error=str(exc))
        raise HTTPException(status_code=500, detail=f"Database query failed: {exc}") from exc

-    for e in events:
+    for e in raw_events:
        e["_id"] = str(e.get("_id", ""))

+    # Apply smart sampling (preserves failures, prioritises admin-relevant services)
+    events = _smart_sample(raw_events, max_events=LLM_MAX_EVENTS)
+
    # If no events, return early
    if not events:
        return AskResponse(
            answer="I couldn't find any audit events matching your question. Try broadening the time range or checking the spelling of the device/user name.",
            events=[],
-            query_info={"entity": entity, "start": start, "end": end, "event_count": 0},
+            query_info={
+                "entity": entity,
+                "start": start,
+                "end": end,
+                "event_count": 0,
+                "total_matched": total,
+                "services_queried": query_services,
+                "excluded_services": excluded_services,
+            },
            llm_used=False,
            llm_error="LLM not used — no events found." if not LLM_API_KEY else None,
        )
@@ -380,7 +543,7 @@ async def ask_question(body: AskRequest, user: dict = Depends(require_auth)):
        llm_error = "LLM_API_KEY is not configured. Set it in your .env to enable AI narrative summarisation."
    else:
        try:
-            answer = await _call_llm(question, events, total=total)
+            answer = await _call_llm(question, events, total=total, excluded_services=excluded_services)
            llm_used = True
        except Exception as exc:
            llm_error = f"LLM call failed: {exc}"
@@ -388,9 +551,11 @@ async def ask_question(body: AskRequest, user: dict = Depends(require_auth)):

    # Fallback: structured summary if LLM unavailable or failed
    if not answer:
-        parts = [f"Found {len(events)} event(s)"]
+        parts = [f"Found {total} event(s)"]
        if entity:
            parts.append(f"related to **{entity}**")
+        if excluded_services:
+            parts.append(f"(excluding {', '.join(excluded_services)})")
        parts.append(f"between {start[:10]} and {end[:10]}.\n")

        for i, e in enumerate(events[:10], 1):
@@ -415,6 +580,8 @@ async def ask_question(body: AskRequest, user: dict = Depends(require_auth)):
            "end": end,
            "event_count": len(events),
            "total_matched": total,
+            "services_queried": query_services,
+            "excluded_services": excluded_services,
            "mongo_query": json.dumps(query, default=str),
        },
        llm_used=llm_used,
--- a/backend/tests/test_ask.py
+++ b/backend/tests/test_ask.py
@@ -236,7 +236,7 @@ class TestAskEndpoint:
            }
        )

-        async def fake_llm(question, events, total=None):
+        async def fake_llm(question, events, total=None, excluded_services=None):
            return "The device had a failed wipe attempt."

        monkeypatch.setattr("routes.ask.LLM_API_KEY", "fake-key")
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -14,7 +14,7 @@ services:
  backend:
    build: ./backend
    # For production, use the pre-built image instead:
-    # image: git.cqre.net/cqrenet/aoc-backend:v1.1.0
+    # image: git.cqre.net/cqrenet/aoc-backend:v1.2.5
    container_name: aoc-backend
    restart: always
    env_file:
Author	SHA1	Message	Date
Tomas Kracmar	b4e504a87b	feat: intent-aware querying + smart sampling for large audit datasets All checks were successful Release / build-and-push (push) Successful in 1m31s Details CI / lint-and-test (push) Successful in 34s Details - Add keyword-based intent extraction: 'device' → Intune, 'user' → Directory, etc. - Broad questions without intent auto-exclude noisy services (Exchange, SharePoint) - Smart stratified sampling: failures always included, high-value services prioritised - Fetch up to 1000 events from MongoDB, then curate best 200 for the LLM - Excluded services noted in LLM prompt and query_info so the admin knows the scope	2026-04-20 17:41:21 +02:00
Tomas Kracmar	b728abb5ee	ci: also tag and push 'latest' on every release All checks were successful CI / lint-and-test (push) Successful in 22s Details	2026-04-20 17:31:27 +02:00
Tomas Kracmar	d100388c7d	chore(release): bump version to 1.2.6 All checks were successful CI / lint-and-test (push) Successful in 31s Details Release / build-and-push (push) Successful in 1m17s Details	2026-04-20 17:29:10 +02:00
Tomas Kracmar	11fd87411d	fix: bake version into Docker image at build time All checks were successful Release / build-and-push (push) Successful in 1m18s Details CI / lint-and-test (push) Successful in 20s Details - Add VERSION build arg to Dockerfile - Pass --build-arg VERSION in release workflow - Remove VERSION env override from docker-compose files - Version is now immutable inside the image, no runtime env var needed	2026-04-20 17:24:20 +02:00
Tomas Kracmar	6a80bf4eb9	fix: read version from env var so it works inside Docker All checks were successful Release / build-and-push (push) Successful in 28s Details CI / lint-and-test (push) Successful in 21s Details	2026-04-20 17:15:55 +02:00
Tomas Kracmar	5e02f5a402	docs: add v1.2.5 release notes All checks were successful CI / lint-and-test (push) Successful in 25s Details	2026-04-20 17:12:43 +02:00
Tomas Kracmar	0c3e5ec57b	feat: add version display to frontend and /api/version endpoint (v1.2.5) All checks were successful Release / build-and-push (push) Successful in 40s Details CI / lint-and-test (push) Successful in 22s Details - Add GET /api/version endpoint that reads VERSION file - Frontend fetches version on init and displays it as a badge in the header - Add version-badge CSS styling - Update docker-compose.yml comment to v1.2.5	2026-04-20 17:09:02 +02:00
@@ -1 +1 @@
 .2.2
 .2.7