PodX - Offline Library with OpenWebUI export
Repo-friendly secrets
- Secrets live in .env at the repo root (NOT committed).
- Commit .env.example. Users copy it to
.env
and fill in their values. - We also include .gitignore to keep
.env
and data paths out of git. .env.example
now includes variables for Meilisearch and OpenWebUI, includingOPENWEBUI_URL
,OPENWEBUI_API_KEY
, andOPENWEBUI_KB_ID
.
Quick start
cp .env.example .env # edit values (MEILI_MASTER_KEY, OPENWEBUI_API_KEY, etc.)
docker compose up -d --build
# UI: http://<host>:8088
# Meili: http://<host>:7700
The worker reaches OpenWebUI at $OPENWEBUI_URL
(default: http://host.docker.internal:3003 on macOS/Windows, or http://openwebui:3003 on Linux Docker networks).
Note: .env.example
includes placeholders for both Meili and OpenWebUI configuration. Be sure to set OPENWEBUI_URL
to point to your OpenWebUI container accordingly.
Components Overview
- scanner: Scans your media folders (
library
andtranscripts
) for new or updated files, triggering ingestion and processing workflows. - worker: Handles general background tasks such as metadata fetching, thumbnail generation, and indexing.
- rss_ingest: Periodically reads an RSS feed list, downloads new podcast episodes, and adds them to your library for processing.
Environment Variables
REFRESH_EXISTING
(defaultfalse
): If set totrue
, forces re-download of metadata, captions, and thumbnails for existing files during scanning.REFRESH_TTL
(default604800
seconds, i.e., 7 days): Time-to-live before metadata and related info are refreshed.REFRESH_FAILURE_TTL
(default86400
seconds, i.e., 1 day): Time-to-live before retrying failed refresh attempts.LIBRARY_HOST_DIR
: Path on the host machine where your source media files reside (mounted into the container).TRANSCRIPTS_HOST_DIR
: Path on the host machine where processed transcripts, subtitles, and metadata are stored.WHISPER_MODEL
: Whisper model variant to use for transcription (e.g.,small
,medium
,large
).WHISPER_PRECISION
: Precision setting for Whisper inference (float32
orfloat16
).WHISPER_LANGUAGE
: Language code for Whisper to use during transcription (e.g.,en
for English).TRANSCRIBE_BACKEND
(defaultlocal
): Set toopenai
to offload Whisper transcription to the OpenAI API instead of running locally.OPENAI_API_KEY
: Required whenTRANSCRIBE_BACKEND=openai
; API key used for authenticated requests.OPENAI_BASE_URL
,OPENAI_TRANSCRIBE_MODEL
,OPENAI_TRANSCRIBE_TIMEOUT
: Optional overrides for the OpenAI transcription endpoint, model and request timeout.YTDLP_COOKIES
: Path to YouTube-DL cookies file for accessing age-restricted or private videos.OPENWEBUI_URL
: Base URL of the OpenWebUI API (default depends on platform).OPENWEBUI_API_KEY
: API key for authenticating PodX workers with OpenWebUI.OPENWEBUI_KB_NAME
: Human-readable Knowledge Base name to attach documents to.OPENWEBUI_KB_ID
: Fixed UUID of the Knowledge Base (avoids duplicate KBs on restart).
RSS Ingestion
PodX supports automated podcast ingestion via RSS feeds:
- Add your podcast RSS feed URLs to a
feeds.txt
file, one URL per line. - The
rss_ingest
component reads this list periodically, downloads new episodes, and places them into thelibrary/podcasts
folder. - Downloaded podcasts are then processed by the scanner and worker to generate transcripts, metadata, and thumbnails.
Refresh Mechanism
PodX periodically refreshes metadata, captions, and thumbnails for media files based on the TTL settings:
- Files older than
REFRESH_TTL
are re-processed to keep metadata up-to-date. - Failed refresh attempts are retried after
REFRESH_FAILURE_TTL
. - Setting
REFRESH_EXISTING=true
in.env
forces a refresh on every scan cycle.
Multi-Worker Setup
For improved performance and scalability, PodX supports running multiple workers with specialized roles:
podx-worker
: Handles general tasks such as scanning, metadata fetching, and indexing.podx-worker-transcribe
: Dedicated to heavy Whisper transcription jobs, isolating resource-intensive audio processing.
This separation helps optimize resource usage and allows parallel processing of different workloads.
Plex Integration
- The library folder contains your source media and can be mounted directly into Plex or other media managers.
- PodX automatically generates NFO files and
.srt
subtitle sidecars per show and episode, enabling rich metadata and transcripts in Plex. - This setup lets you browse, search, and play your media with synchronized transcripts and metadata seamlessly.
Ingest helpers
MEILI_URL=http://localhost:7700 MEILI_KEY=$MEILI_MASTER_KEY ./ingest/ingest_pdfs.sh /path/*.pdf
MEILI_URL=http://localhost:7700 MEILI_KEY=$MEILI_MASTER_KEY ./ingest/ingest_epub.py /path/*.epub
MEILI_URL=http://localhost:7700 MEILI_KEY=$MEILI_MASTER_KEY ./ingest/ingest_kiwix.sh /path/wiki.zim
Backfill existing files into OpenWebUI
# From repo root:
./tools/backfill_openwebui.sh
# Or include extra folders to scan:
./tools/backfill_openwebui.sh /some/other/folder /another/folder
- Reads
.env
forOPENWEBUI_URL
,OPENWEBUI_API_KEY
,OPENWEBUI_KB_NAME
. - Uploads
*.txt
,*.md
,*.html
it finds in./transcripts
and./library/web
by default.
Difference between library
and transcripts
folders
The library folder contains the downloaded source media such as videos, podcasts, web snapshots, and other original files. This folder is the one you can mount to Plex or other media managers to access and play your media content.
The transcripts folder, on the other hand, contains processed text data including transcripts, subtitles, and JSON metadata. This folder is mainly used for search and ingestion into OpenWebUI and usually does not need to be mounted in Plex or other media players.
Generating required secrets
1. Meilisearch master key
Meilisearch needs a strong master key (like a root password). Generate one locally:
# On Linux or Mac with OpenSSL installed
openssl rand -hex 32
# Example output (keep it secret, do not reuse this exact value):
92e4d0d2e4c6f489a91dfc30b6fd6c985f6780ad827f1e7ce1bb3c6dc81d562b
Then put it in your .env
:
MEILI_MASTER_KEY=92e4d0d2e4c6f489a91dfc30b6fd6c985f6780ad827f1e7ce1bb3c6dc81d562b
MEILI_KEY=${MEILI_MASTER_KEY}
2. OpenWebUI API key
To allow PodX to push documents into your OpenWebUI Knowledge Base, create an API key:
- Go to your running OpenWebUI (e.g. http://localhost:3003).
- Log in with your admin account.
- Navigate to Settings → API Keys.
- Click Generate new API key, give it a name like
podx-worker
. - Copy the generated key and add it to
.env
:
OPENWEBUI_API_KEY=sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
If the key is ever leaked, revoke it in OpenWebUI and generate a new one.
3. Knowledge Base ID
To avoid creating duplicate Knowledge Bases on every PodX restart, use a fixed Knowledge Base ID:
- Run the command below to list your OpenWebUI Knowledge Bases and their IDs:
./scripts/podx-tools.sh owui-kbs
- Copy the UUID of the desired Knowledge Base and set it in your
.env
:
OPENWEBUI_KB_ID=your-knowledge-base-uuid-here
This ensures PodX attaches documents consistently to the same Knowledge Base.
Using podx-tools
PodX includes a helper script ./scripts/podx-tools.sh
for interacting with OpenWebUI.
Common commands
-
Check OpenWebUI connectivity
./scripts/podx-tools.sh owui-health
Verifies that OpenWebUI is reachable at
$OPENWEBUI_URL
. -
List Knowledge Bases
./scripts/podx-tools.sh owui-kbs
Lists available Knowledge Bases with their UUIDs.
-
Resolve Knowledge Base ID by name
./scripts/podx-tools.sh owui-kb-resolve "Homelab Library"
Resolves the fixed UUID for a KB by its human-readable name.
-
Debug KB info
./scripts/podx-tools.sh owui-kb-debug "Homelab Library"
-
Attach a file
./scripts/podx-tools.sh owui-attach "Homelab Library" /path/to/file.txt
Uploads a transcript or document to a KB. Supports
.txt
,.md
,.json
, and.html
. -
List files in a KB
./scripts/podx-tools.sh owui-kb-files "Homelab Library"
Notes
OPENWEBUI_URL
,OPENWEBUI_API_KEY
, andOPENWEBUI_KB_ID
must be set in your.env
.- JSON files are optional. Only attach them if you want their contents searchable.
- Duplicate or empty content may be rejected by OpenWebUI with a
400
error.
Troubleshooting
Common errors and fixes
-
400: The content provided is empty
This usually means the transcript file was empty, binary, or mis-encoded. Verify that the.txt
files really contain text and are not corrupted. -
Duplicate Knowledge Base creation
Fix this by settingOPENWEBUI_KB_ID
in your.env
after running./scripts/podx-tools.sh owui-kbs
to get the fixed KB ID. -
Worker cannot connect to OpenWebUI (
curl: Failed to connect to localhost:3003
)
EnsureOPENWEBUI_URL
is correctly set tohttp://host.docker.internal:3003
on macOS/Windows orhttp://openwebui:3003
on Linux Docker networks. -
Attaching files silently fails or shows
pending
forever
Checkpodx-worker
logs for errors, make sure thepodx-worker-transcribe
is running for audio transcription tasks, and verify thatOPENWEBUI_API_KEY
is valid. -
Multiple Knowledge Bases with the same name
Resolve this explicitly using theowui-kb-resolve
command to get the fixed Knowledge Base ID.
Worker separation
PodX runs two types of workers for better resource management:
podx-worker
: Handles general tasks such as scanning, metadata fetching, and indexing.podx-worker-transcribe
: Dedicated to Whisper transcription jobs, isolating resource-intensive audio processing to optimize performance.