`audia.agents` — Pipeline agents

The agents module contains the LangGraph pipeline nodes and supporting functions.

`audia.agents.graph` — LangGraph pipeline

LangGraph pipeline: PDF → extracted text → curated text → audio file.

Graph structure (linear – no optional steps):

extract_text ─► preprocess ─► curate ─► synthesize_audio ─► END

extract_text : PyMuPDF → raw text + metadata
preprocess : heuristic regex pre-pass (fast)
curate : LLM – math → English, table summaries, ack condensing
synthesize_audio: TTS → audio file

Each node receives the full PipelineState and returns a partial update.

audia.agents.graph.node_extract_text(state)[source]

Extract text and basic metadata from the PDF.

Parameters:: state (PipelineState)
Return type:: dict[str, Any]

audia.agents.graph.node_preprocess(state)[source]

Heuristic regex pre-pass – fast, no LLM.

Parameters:: state (PipelineState)
Return type:: dict[str, Any]

audia.agents.graph.node_curate(state)[source]

LLM curation: math → English, table summaries, ack condensing.

Parameters:: state (PipelineState)
Return type:: dict[str, Any]

audia.agents.graph.node_synthesize_audio(state)[source]

Convert the final curated text to an audio file via TTS.

Parameters:: state (PipelineState)
Return type:: dict[str, Any]

audia.agents.graph.build_pipeline()[source]

Compile and return the LangGraph CompiledGraph.

Return type:: Any

audia.agents.graph.run_pipeline(pdf_path, output_dir=None)[source]

Convenience function: build and run the pipeline for a single PDF.

Returns the final PipelineState dict. After each run, the three text stages are saved to ~/.audia/debug/<stem>_<YYYYMMDD_HHMMSS>/ for inspection.

Parameters:

pdf_path (str | Path)
output_dir (str | Path | None)

Return type:

PipelineState

`audia.agents.state` — Pipeline state

LangGraph state definition for the PDF → audio pipeline.

class audia.agents.state.PipelineState[source]

Bases: TypedDict

State that flows through every node of the LangGraph pipeline.

pdf_path: str

output_dir: str

raw_text: str

preprocessed_text: str

cleaned_text: str

audio_path: str | None

audio_filename: str

title: str

num_pages: int

tts_backend: str

tts_voice: str

run_id: str

error: str | None

`audia.agents.pdf_processor` — PDF extraction

PDF text extraction using PyMuPDF (fitz).

Handles: - Multi-page PDFs - Basic heuristic removal of headers, footers, page numbers,

references section, and acknowledgements section.

class audia.agents.pdf_processor.ExtractionResult(text, num_pages, title)[source]

Bases: NamedTuple

Parameters:

text (str)
num_pages (int)
title (str)

text: str: Alias for field number 0

num_pages: int: Alias for field number 1

title: str: Alias for field number 2

audia.agents.pdf_processor.extract_text(pdf_path)[source]

Extract and pre-clean text from a PDF.

Returns an ExtractionResult with cleaned text, page count, and guessed title. Raises FileNotFoundError if the PDF does not exist.

Parameters:: pdf_path (str | Path)
Return type:: ExtractionResult

`audia.agents.text_cleaner` — Heuristic + LLM curation

Text curation pipeline – the core intelligence of audia.

Two-stage process:

heuristic_clean() – fast regex pre-pass: removes citations, LaTeX artefacts,
collapses whitespace. Reduces LLM token cost.
llm_curate() – LLM pass (ALWAYS required): rewrites math in plain English,
summarises tables, condenses acknowledgements, ensures smooth spoken-word flow.

audia.agents.text_cleaner.heuristic_clean(text)[source]

Fast regex pre-pass – always runs before the LLM call to reduce token cost.

Parameters:: text (str)
Return type:: str

audia.agents.text_cleaner.llm_curate(text, settings=None, progress_cb=None)[source]

LLM curation pass – ALWAYS required, always runs. Raises RuntimeError on misconfiguration / missing API key.

Parameters:

progress_cb (callable(str) | None) – Optional callback invoked with a plain-text progress line for each chunk so callers (e.g. the web job runner) can surface per-chunk progress without parsing Rich markup.
text (str)
settings (Settings | None)

Return type:

str

audia.agents.text_cleaner.llm_clean(text, settings=None, progress_cb=None)

LLM curation pass – ALWAYS required, always runs. Raises RuntimeError on misconfiguration / missing API key.

Parameters:

progress_cb (callable(str) | None) – Optional callback invoked with a plain-text progress line for each chunk so callers (e.g. the web job runner) can surface per-chunk progress without parsing Rich markup.
text (str)
settings (Settings | None)

Return type:

str

audia.agents.text_cleaner.curate_text(text, settings=None)[source]

Full curation pipeline: heuristic pre-pass → LLM curation.

Transition guarantee

Each chunk (from chunk 2 onward) receives the tail of the previous curated chunk as read-only context so the LLM can write a smooth spoken transition without re-processing or re-outputting already-curated text. The full paper content is preserved as-is after the LLM pass — no content is dropped or deduplicated.

Parameters:

text (str)
settings (Settings | None)

Return type:

str

audia.agents.text_cleaner.clean_text(text, settings=None)[source]

Parameters:

text (str)
settings (Settings | None)

Return type:

str

`audia.agents.tts` — Text-to-speech synthesis

Text-to-Speech wrapper supporting multiple backends:

edge-tts (default, free, requires internet)

kokoro (local, requires: pip install audia[kokoro])

openai (requires API key)

All backends return the absolute path to the generated audio file.

audia.agents.tts.synthesize(text, output_dir=None, filename=None, settings=None, progress_cb=None)[source]

Convert text to an audio file and return its path.

Parameters:

text (The cleaned text to synthesise.)
output_dir (Directory for the output file. Defaults to settings.audio_dir.)
filename (Desired filename (without extension). Auto-generated when None.)
settings (Audia settings; uses global settings when None.)

Return type:

Path

`audia.agents.stt` — Speech-to-text + query distillation

Speech-to-Text input – record from microphone and transcribe.

audia.agents.stt.record_and_transcribe(seconds=30, samplerate=16000, model_size='base', device='cpu')[source]

Record audio from the default microphone and return the transcription.

Parameters:

seconds (Maximum recording duration.)
samplerate (Audio sample rate (16 kHz is recommended for Whisper).)
model_size (faster-whisper model: tiny | base | small | medium | large-v3)
device ('cpu' or 'cuda')

Return type:

str

audia.agents.stt.transcribe_file(audio_path, model_size='base', device='cpu')[source]

Transcribe an existing audio file (wav, mp3, …).

Parameters:

audio_path (str | Path)
model_size (str)
device (str)

Return type:

str

audia.agents.stt.distill_search_query(speech)[source]

Use the configured LLM to extract a concise ArXiv search query from raw speech.

Example

>>> distill_search_query("I would like to research about agentic AI.")
'agentic AI research'

Parameters:: speech (str)
Return type:: str

`audia.agents.research` — ArXiv research

ArXiv paper search and download.

Primary: arxiv Python SDK. Fallback: HTML scrape of arxiv.org/search (used when the API returns 429).

class audia.agents.research.ArxivPaper(arxiv_id, title, authors, abstract, pdf_url, published, local_pdf_path=None)[source]

Bases: object

Lightweight representation of an ArXiv result.

Parameters:

arxiv_id (str)
title (str)
authors (list[str])
abstract (str)
pdf_url (str)
published (str)
local_pdf_path (str | None)

arxiv_id: str

title: str

authors: list[str]

abstract: str

pdf_url: str

published: str

local_pdf_path: str | None = None

class audia.agents.research.ArxivSearcher(max_results=None)[source]

Bases: object

Search ArXiv and download PDFs.

Parameters:: max_results (int | None)

search(query)[source]

Search ArXiv for query and return up to max_results papers.

Falls back to HTML scraping if the API returns an error (e.g. HTTP 429).

Parameters:: query (str)
Return type:: list[ArxivPaper]

download_pdf(paper, dest_dir=None)[source]

Download the PDF for paper directly from arxiv.org/pdf/<id>.

Bypasses the arxiv SDK export API entirely to avoid HTTP 429 rate-limits. Skips the download if the file already exists.

Parameters:

paper (ArxivPaper)
dest_dir (str | Path | None)

Return type:

Path

audia.agents — Pipeline agents

audia.agents.graph — LangGraph pipeline

audia.agents.state — Pipeline state

audia.agents.pdf_processor — PDF extraction

audia.agents.text_cleaner — Heuristic + LLM curation

Transition guarantee

audia.agents.tts — Text-to-speech synthesis

audia.agents.stt — Speech-to-text + query distillation

audia.agents.research — ArXiv research

`audia.agents` — Pipeline agents

`audia.agents.graph` — LangGraph pipeline

`audia.agents.state` — Pipeline state

`audia.agents.pdf_processor` — PDF extraction

`audia.agents.text_cleaner` — Heuristic + LLM curation

`audia.agents.tts` — Text-to-speech synthesis

`audia.agents.stt` — Speech-to-text + query distillation

`audia.agents.research` — ArXiv research