2025-09-08 09:44:03 +02:00
2025-09-07 10:42:27 +02:00
2025-09-08 13:32:06 +02:00
2025-09-07 10:42:27 +02:00
2025-09-07 12:06:34 +02:00
2025-09-07 15:30:22 +02:00
2025-09-08 07:17:42 +02:00
2025-09-07 15:30:22 +02:00
2025-09-04 14:18:18 +00:00
2025-09-07 17:43:45 +02:00

PodX - Offline Library with OpenWebUI export

Repo-friendly secrets

  • Secrets live in .env at the repo root (NOT committed).
  • Commit .env.example. Users copy it to .env and fill in their values.
  • We also include .gitignore to keep .env and data paths out of git.

Quick start

cp .env.example .env   # edit values (MEILI_MASTER_KEY, OPENWEBUI_API_KEY, etc.)
docker compose up -d --build
# UI:   http://<host>:8088
# Meili: http://<host>:7700

The worker reaches OpenWebUI at $OPENWEBUI_URL (default: http://host.docker.internal:3003).

Components Overview

  • scanner: Scans your media folders (library and transcripts) for new or updated files, triggering ingestion and processing workflows.
  • worker: Handles general background tasks such as metadata fetching, thumbnail generation, and indexing.
  • rss_ingest: Periodically reads an RSS feed list, downloads new podcast episodes, and adds them to your library for processing.

Environment Variables

  • REFRESH_EXISTING (default false): If set to true, forces re-download of metadata, captions, and thumbnails for existing files during scanning.
  • REFRESH_TTL (default 604800 seconds, i.e., 7 days): Time-to-live before metadata and related info are refreshed.
  • REFRESH_FAILURE_TTL (default 86400 seconds, i.e., 1 day): Time-to-live before retrying failed refresh attempts.
  • LIBRARY_HOST_DIR: Path on the host machine where your source media files reside (mounted into the container).
  • TRANSCRIPTS_HOST_DIR: Path on the host machine where processed transcripts, subtitles, and metadata are stored.
  • WHISPER_MODEL: Whisper model variant to use for transcription (e.g., small, medium, large).
  • WHISPER_PRECISION: Precision setting for Whisper inference (float32 or float16).
  • WHISPER_LANGUAGE: Language code for Whisper to use during transcription (e.g., en for English).
  • YTDLP_COOKIES: Path to YouTube-DL cookies file for accessing age-restricted or private videos.

RSS Ingestion

PodX supports automated podcast ingestion via RSS feeds:

  • Add your podcast RSS feed URLs to a feeds.txt file, one URL per line.
  • The rss_ingest component reads this list periodically, downloads new episodes, and places them into the library/podcasts folder.
  • Downloaded podcasts are then processed by the scanner and worker to generate transcripts, metadata, and thumbnails.

Refresh Mechanism

PodX periodically refreshes metadata, captions, and thumbnails for media files based on the TTL settings:

  • Files older than REFRESH_TTL are re-processed to keep metadata up-to-date.
  • Failed refresh attempts are retried after REFRESH_FAILURE_TTL.
  • Setting REFRESH_EXISTING=true in .env forces a refresh on every scan cycle.

Multi-Worker Setup

For improved performance and scalability, PodX supports running multiple workers with specialized roles:

  • podx-worker: Handles general tasks such as scanning, metadata fetching, and indexing.
  • podx-worker-transcribe: Dedicated to heavy Whisper transcription jobs, isolating resource-intensive audio processing.

This separation helps optimize resource usage and allows parallel processing of different workloads.

Plex Integration

  • The library folder contains your source media and can be mounted directly into Plex or other media managers.
  • PodX automatically generates NFO files and .srt subtitle sidecars per show and episode, enabling rich metadata and transcripts in Plex.
  • This setup lets you browse, search, and play your media with synchronized transcripts and metadata seamlessly.

Ingest helpers

MEILI_URL=http://localhost:7700 MEILI_KEY=$MEILI_MASTER_KEY ./ingest/ingest_pdfs.sh /path/*.pdf
MEILI_URL=http://localhost:7700 MEILI_KEY=$MEILI_MASTER_KEY ./ingest/ingest_epub.py /path/*.epub
MEILI_URL=http://localhost:7700 MEILI_KEY=$MEILI_MASTER_KEY ./ingest/ingest_kiwix.sh /path/wiki.zim

Backfill existing files into OpenWebUI

# From repo root:
./tools/backfill_openwebui.sh
# Or include extra folders to scan:
./tools/backfill_openwebui.sh /some/other/folder /another/folder
  • Reads .env for OPENWEBUI_URL, OPENWEBUI_API_KEY, OPENWEBUI_KB_NAME.
  • Uploads *.txt, *.md, *.html it finds in ./transcripts and ./library/web by default.

Difference between library and transcripts folders

The library folder contains the downloaded source media such as videos, podcasts, web snapshots, and other original files. This folder is the one you can mount to Plex or other media managers to access and play your media content.

The transcripts folder, on the other hand, contains processed text data including transcripts, subtitles, and JSON metadata. This folder is mainly used for search and ingestion into OpenWebUI and usually does not need to be mounted in Plex or other media players.

Generating required secrets

1. Meilisearch master key

Meilisearch needs a strong master key (like a root password). Generate one locally:

# On Linux or Mac with OpenSSL installed
openssl rand -hex 32

# Example output (keep it secret, do not reuse this exact value):
92e4d0d2e4c6f489a91dfc30b6fd6c985f6780ad827f1e7ce1bb3c6dc81d562b

Then put it in your .env:

MEILI_MASTER_KEY=92e4d0d2e4c6f489a91dfc30b6fd6c985f6780ad827f1e7ce1bb3c6dc81d562b
MEILI_KEY=${MEILI_MASTER_KEY}

2. OpenWebUI API key

To allow PodX to push documents into your OpenWebUI Knowledge Base, create an API key:

  1. Go to your running OpenWebUI (e.g. http://localhost:3003).
  2. Log in with your admin account.
  3. Navigate to Settings → API Keys.
  4. Click Generate new API key, give it a name like podx-worker.
  5. Copy the generated key and add it to .env:
OPENWEBUI_API_KEY=sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

If the key is ever leaked, revoke it in OpenWebUI and generate a new one.

Description
No description provided
Readme MIT 1.1 MiB
Languages
Python 76%
Shell 23.1%
Dockerfile 0.9%