250 lines
		
	
	
		
			12 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			250 lines
		
	
	
		
			12 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| # PodX - Offline Library with OpenWebUI export
 | |
| 
 | |
| ## Repo-friendly secrets
 | |
| - Secrets live in **.env** at the repo root (NOT committed).
 | |
| - Commit **.env.example**. Users copy it to `.env` and fill in their values.
 | |
| - We also include **.gitignore** to keep `.env` and data paths out of git.
 | |
| - `.env.example` now includes variables for Meilisearch **and** OpenWebUI, including `OPENWEBUI_URL`, `OPENWEBUI_API_KEY`, and `OPENWEBUI_KB_ID`.
 | |
| 
 | |
| ## Quick start
 | |
| ```bash
 | |
| cp .env.example .env   # edit values (MEILI_MASTER_KEY, OPENWEBUI_API_KEY, etc.)
 | |
| docker compose up -d --build
 | |
| # UI:   http://<host>:8088
 | |
| # Meili: http://<host>:7700
 | |
| ```
 | |
| The worker reaches OpenWebUI at `$OPENWEBUI_URL` (default: http://host.docker.internal:3003 on macOS/Windows, or http://openwebui:3003 on Linux Docker networks).
 | |
| 
 | |
| Note: `.env.example` includes placeholders for both **Meili** and **OpenWebUI** configuration. Be sure to set `OPENWEBUI_URL` to point to your OpenWebUI container accordingly.
 | |
| 
 | |
| ## GPU (CUDA) Setup
 | |
| 
 | |
| To run Whisper on NVIDIA GPU:
 | |
| 
 | |
| - Install the NVIDIA driver on the host and the NVIDIA Container Toolkit.
 | |
| - Copy `.env.example` to `.env` and set:
 | |
|   - `DOCKER_GPU_RUNTIME=nvidia`
 | |
|   - `NVIDIA_VISIBLE_DEVICES=all` (or a specific GPU index)
 | |
|   - `WHISPER_DEVICE=cuda` (or `auto`)
 | |
|   - `WHISPER_PRECISION=float16` (recommended for GPU)
 | |
|   - Optional: set a GPU base image for builds (amd64 typical):
 | |
|     - `GPU_BASE_IMAGE=nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu22.04`
 | |
| - Rebuild and start: `docker compose up -d --build`
 | |
| - Check logs for `device='cuda'` when the transcribe worker loads the model.
 | |
| 
 | |
| By default we build from `python:3.11-slim`. You can override the base image at build time via `GPU_BASE_IMAGE` to a CUDA runtime tag that exists for your architecture. If you don't specify it or you're on a non-GPU host, the containers run on CPU.
 | |
| 
 | |
| ## Components Overview
 | |
| 
 | |
| - **scanner**: Scans your media folders (`library` and `transcripts`) for new or updated files, triggering ingestion and processing workflows.
 | |
| - **worker**: Handles general background tasks such as metadata fetching, thumbnail generation, and indexing.
 | |
| - **rss_ingest**: Periodically reads an RSS feed list, downloads new podcast episodes, and adds them to your library for processing.
 | |
| 
 | |
| ## Environment Variables
 | |
| 
 | |
| - `REFRESH_EXISTING` (default `false`): If set to `true`, forces re-download of metadata, captions, and thumbnails for existing files during scanning.
 | |
| - `REFRESH_TTL` (default `604800` seconds, i.e., 7 days): Time-to-live before metadata and related info are refreshed.
 | |
| - `REFRESH_FAILURE_TTL` (default `86400` seconds, i.e., 1 day): Time-to-live before retrying failed refresh attempts.
 | |
| - `LIBRARY_HOST_DIR`: Path on the host machine where your source media files reside (mounted into the container).
 | |
| - `TRANSCRIPTS_HOST_DIR`: Path on the host machine where processed transcripts, subtitles, and metadata are stored.
 | |
| - `WHISPER_MODEL`: Whisper model variant to use for transcription (e.g., `small`, `medium`, `large`).
 | |
| - `WHISPER_PRECISION`: Precision setting for Whisper inference (`float32` or `float16`).
 | |
| - `WHISPER_LANGUAGE`: Language code for Whisper to use during transcription (e.g., `en` for English).
 | |
| - `WHISPER_DEVICE`: Device selection for faster-whisper (`cpu`, `cuda`, or `auto`). Default is `cpu` in docker-compose to avoid GPU lib issues on non-GPU hosts.
 | |
| - `WHISPER_CPU_THREADS`: CPU threads used for Whisper when `WHISPER_DEVICE=cpu` (default `4`).
 | |
| - `TRANSCRIBE_BACKEND` (default `local`): Set to `openai` to offload Whisper transcription to the OpenAI API instead of running locally.
 | |
| - `OPENAI_API_KEY`: Required when `TRANSCRIBE_BACKEND=openai`; API key used for authenticated requests.
 | |
| - `OPENAI_BASE_URL`, `OPENAI_TRANSCRIBE_MODEL`, `OPENAI_TRANSCRIBE_TIMEOUT`: Optional overrides for the OpenAI transcription endpoint, model and request timeout.
 | |
| - `YTDLP_COOKIES`: Path to YouTube-DL cookies file for accessing age-restricted or private videos.
 | |
| - `OPENWEBUI_URL`: Base URL of the OpenWebUI API (default depends on platform).
 | |
| - `OPENWEBUI_API_KEY`: API key for authenticating PodX workers with OpenWebUI.
 | |
| - `OPENWEBUI_KB_NAME`: Human-readable Knowledge Base name to attach documents to.
 | |
| - `OPENWEBUI_KB_ID`: Fixed UUID of the Knowledge Base (avoids duplicate KBs on restart).
 | |
| - `OPENWEBUI_AUTO_FIX_METADATA` (default `1`): When enabled, PodX clears/overrides the Knowledge Base metadata template before uploads to prevent ingestion crashes from invalid templates.
 | |
| - `OPENWEBUI_METADATA_TEMPLATE_JSON`: Optional JSON applied when the auto-fix runs (defaults to `{}`, i.e., no custom metadata template).
 | |
| - `MEDIA_NORMALIZE` (default `1`): Automatically transcode downloaded media into Plex-friendly formats (HEVC MP4 for video, MP3 for audio by default).
 | |
| - `MEDIA_NORMALIZE_KEEP_ORIGINAL` (default `0`): Preserve the source file alongside the normalised copy (appends `.orig*`).
 | |
| - `VIDEO_NORMALIZE_*`: Fine-tune video conversion (`VIDEO_NORMALIZE_CODEC`, `VIDEO_NORMALIZE_EXTENSION`, `VIDEO_NORMALIZE_CRF`, `VIDEO_NORMALIZE_PRESET`, `VIDEO_NORMALIZE_AUDIO_CODEC`, `VIDEO_NORMALIZE_AUDIO_BITRATE`).
 | |
| - `AUDIO_NORMALIZE_*`: Control audio conversion (`AUDIO_NORMALIZE_CODEC`, `AUDIO_NORMALIZE_EXTENSION`, `AUDIO_NORMALIZE_BITRATE`, `AUDIO_NORMALIZE_CHANNELS`).
 | |
| - `PODCASTS_ROOT` (default `/library`): Target directory inside the container where RSS podcast audio is saved.
 | |
| - `PODCASTS_PER_SHOW` (default `true`): Organize episodes into per-show subfolders under `PODCASTS_ROOT`.
 | |
| 
 | |
| ## RSS Ingestion
 | |
| 
 | |
| PodX supports automated podcast ingestion via RSS feeds:
 | |
| 
 | |
| - Add your podcast RSS feed URLs to a `feeds.txt` file, one URL per line.
 | |
| - The `rss_ingest` component reads this list periodically, downloads new episodes, and places them into `PODCASTS_ROOT` (default `/library`), optionally into per-show subfolders if `PODCASTS_PER_SHOW=true`.
 | |
| - Downloaded podcasts are then processed by the scanner and worker to generate transcripts, metadata, and thumbnails.
 | |
| 
 | |
| ## Refresh Mechanism
 | |
| 
 | |
| PodX periodically refreshes metadata, captions, and thumbnails for media files based on the TTL settings:
 | |
| 
 | |
| - Files older than `REFRESH_TTL` are re-processed to keep metadata up-to-date.
 | |
| - Failed refresh attempts are retried after `REFRESH_FAILURE_TTL`.
 | |
| - Setting `REFRESH_EXISTING=true` in `.env` forces a refresh on every scan cycle.
 | |
| 
 | |
| ## Multi-Worker Setup
 | |
| 
 | |
| For improved performance and scalability, PodX supports running multiple workers with specialized roles:
 | |
| 
 | |
| - `podx-worker`: Handles general tasks such as scanning, metadata fetching, and indexing.
 | |
| - `podx-worker-transcribe`: Dedicated to heavy Whisper transcription jobs, isolating resource-intensive audio processing.
 | |
| 
 | |
| This separation helps optimize resource usage and allows parallel processing of different workloads.
 | |
| 
 | |
| ## Plex Integration
 | |
| 
 | |
| - The **library** folder contains your source media and can be mounted directly into Plex or other media managers.
 | |
| - PodX automatically generates NFO files and `.srt` subtitle sidecars per show and episode, enabling rich metadata and transcripts in Plex.
 | |
| - This setup lets you browse, search, and play your media with synchronized transcripts and metadata seamlessly.
 | |
| 
 | |
| ## Ingest helpers
 | |
| ```bash
 | |
| MEILI_URL=http://localhost:7700 MEILI_KEY=$MEILI_MASTER_KEY ./ingest/ingest_pdfs.sh /path/*.pdf
 | |
| MEILI_URL=http://localhost:7700 MEILI_KEY=$MEILI_MASTER_KEY ./ingest/ingest_epub.py /path/*.epub
 | |
| MEILI_URL=http://localhost:7700 MEILI_KEY=$MEILI_MASTER_KEY ./ingest/ingest_kiwix.sh /path/wiki.zim
 | |
| ```
 | |
| 
 | |
| ## Backfill existing files into OpenWebUI
 | |
| ```bash
 | |
| # From repo root:
 | |
| ./tools/backfill_openwebui.sh
 | |
| # Or include extra folders to scan:
 | |
| ./tools/backfill_openwebui.sh /some/other/folder /another/folder
 | |
| ```
 | |
| - Reads `.env` for `OPENWEBUI_URL`, `OPENWEBUI_API_KEY`, `OPENWEBUI_KB_NAME`.
 | |
| - Uploads `*.txt`, `*.md`, `*.html` it finds in `./transcripts` and `./library/web` by default.
 | |
| 
 | |
| ## Difference between `library` and `transcripts` folders
 | |
| 
 | |
| The **library** folder contains the downloaded source media such as videos, podcasts, web snapshots, and other original files. This folder is the one you can mount to Plex or other media managers to access and play your media content.
 | |
| 
 | |
| The **transcripts** folder, on the other hand, contains processed text data including transcripts, subtitles, and JSON metadata. This folder is mainly used for search and ingestion into OpenWebUI and usually does not need to be mounted in Plex or other media players.
 | |
| 
 | |
| ## Generating required secrets
 | |
| 
 | |
| ### 1. Meilisearch master key
 | |
| Meilisearch needs a strong master key (like a root password). Generate one locally:
 | |
| 
 | |
| ```bash
 | |
| # On Linux or Mac with OpenSSL installed
 | |
| openssl rand -hex 32
 | |
| 
 | |
| # Example output (keep it secret, do not reuse this exact value):
 | |
| 92e4d0d2e4c6f489a91dfc30b6fd6c985f6780ad827f1e7ce1bb3c6dc81d562b
 | |
| ```
 | |
| 
 | |
| Then put it in your `.env`:
 | |
| 
 | |
| ```dotenv
 | |
| MEILI_MASTER_KEY=92e4d0d2e4c6f489a91dfc30b6fd6c985f6780ad827f1e7ce1bb3c6dc81d562b
 | |
| MEILI_KEY=${MEILI_MASTER_KEY}
 | |
| ```
 | |
| 
 | |
| ### 2. OpenWebUI API key
 | |
| To allow PodX to push documents into your OpenWebUI Knowledge Base, create an API key:
 | |
| 
 | |
| 1. Go to your running OpenWebUI (e.g. [http://localhost:3003](http://localhost:3003)).
 | |
| 2. Log in with your admin account.
 | |
| 3. Navigate to **Settings → API Keys**.
 | |
| 4. Click **Generate new API key**, give it a name like `podx-worker`.
 | |
| 5. Copy the generated key and add it to `.env`:
 | |
| 
 | |
| ```dotenv
 | |
| OPENWEBUI_API_KEY=sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
 | |
| ```
 | |
| 
 | |
| If the key is ever leaked, revoke it in OpenWebUI and generate a new one.
 | |
| 
 | |
| ### 3. Knowledge Base ID
 | |
| To avoid creating duplicate Knowledge Bases on every PodX restart, use a fixed Knowledge Base ID:
 | |
| 
 | |
| - Run the command below to list your OpenWebUI Knowledge Bases and their IDs:
 | |
| 
 | |
| ```bash
 | |
| ./scripts/podx-tools.sh owui-kbs
 | |
| ```
 | |
| 
 | |
| - Copy the UUID of the desired Knowledge Base and set it in your `.env`:
 | |
| 
 | |
| ```dotenv
 | |
| OPENWEBUI_KB_ID=your-knowledge-base-uuid-here
 | |
| ```
 | |
| 
 | |
| This ensures PodX attaches documents consistently to the same Knowledge Base.
 | |
| 
 | |
| ## Using podx-tools
 | |
| 
 | |
| PodX includes a helper script `./scripts/podx-tools.sh` for interacting with OpenWebUI.
 | |
| 
 | |
| ### Common commands
 | |
| 
 | |
| - **Check OpenWebUI connectivity**
 | |
|   ```bash
 | |
|   ./scripts/podx-tools.sh owui-health
 | |
|   ```
 | |
|   Verifies that OpenWebUI is reachable at `$OPENWEBUI_URL`.
 | |
| 
 | |
| - **List Knowledge Bases**
 | |
|   ```bash
 | |
|   ./scripts/podx-tools.sh owui-kbs
 | |
|   ```
 | |
|   Lists available Knowledge Bases with their UUIDs.
 | |
| 
 | |
| - **Resolve Knowledge Base ID by name**
 | |
|   ```bash
 | |
|   ./scripts/podx-tools.sh owui-kb-resolve "Homelab Library"
 | |
|   ```
 | |
|   Resolves the fixed UUID for a KB by its human-readable name.
 | |
| 
 | |
| - **Debug KB info**
 | |
|   ```bash
 | |
|   ./scripts/podx-tools.sh owui-kb-debug "Homelab Library"
 | |
|   ```
 | |
| 
 | |
| - **Attach a file**
 | |
|   ```bash
 | |
|   ./scripts/podx-tools.sh owui-attach "Homelab Library" /path/to/file.txt
 | |
|   ```
 | |
|   Uploads a transcript or document to a KB. Supports `.txt`, `.md`, `.json`, and `.html`.
 | |
| 
 | |
| - **List files in a KB**
 | |
|   ```bash
 | |
|   ./scripts/podx-tools.sh owui-kb-files "Homelab Library"
 | |
|   ```
 | |
| 
 | |
| ### Notes
 | |
| 
 | |
| - `OPENWEBUI_URL`, `OPENWEBUI_API_KEY`, and `OPENWEBUI_KB_ID` must be set in your `.env`.
 | |
| - JSON files are optional. Only attach them if you want their contents searchable.
 | |
| - Duplicate or empty content may be rejected by OpenWebUI with a `400` error.
 | |
| 
 | |
| 
 | |
| ## Troubleshooting
 | |
| 
 | |
| ### Common errors and fixes
 | |
| 
 | |
| - **`400: The content provided is empty`**  
 | |
|   This usually means the transcript file was empty, binary, or mis-encoded. Verify that the `.txt` files really contain text and are not corrupted.
 | |
| 
 | |
| - **Duplicate Knowledge Base creation**  
 | |
|   Fix this by setting `OPENWEBUI_KB_ID` in your `.env` after running `./scripts/podx-tools.sh owui-kbs` to get the fixed KB ID.
 | |
| 
 | |
| - **Worker cannot connect to OpenWebUI (`curl: Failed to connect to localhost:3003`)**  
 | |
|   Ensure `OPENWEBUI_URL` is correctly set to `http://host.docker.internal:3003` on macOS/Windows or `http://openwebui:3003` on Linux Docker networks.
 | |
| 
 | |
| - **Attaching files silently fails or shows `pending` forever**  
 | |
|   Check `podx-worker` logs for errors, make sure the `podx-worker-transcribe` is running for audio transcription tasks, and verify that `OPENWEBUI_API_KEY` is valid.
 | |
| 
 | |
| - **Multiple Knowledge Bases with the same name**  
 | |
|   Resolve this explicitly using the `owui-kb-resolve` command to get the fixed Knowledge Base ID.
 | |
| 
 | |
| ### Worker separation
 | |
| 
 | |
| PodX runs two types of workers for better resource management:
 | |
| 
 | |
| - `podx-worker`: Handles general tasks such as scanning, metadata fetching, and indexing.
 | |
| - `podx-worker-transcribe`: Dedicated to Whisper transcription jobs, isolating resource-intensive audio processing to optimize performance.
 |