audia.agents — Pipeline agents

The agents module contains the LangGraph pipeline nodes and supporting functions.


audia.agents.graph — LangGraph pipeline

LangGraph pipeline: PDF → extracted text → curated text → audio file.

Graph structure (linear – no optional steps):

extract_text ─► preprocess ─► curate ─► synthesize_audio ─► END

  • extract_text : PyMuPDF → raw text + metadata

  • preprocess : heuristic regex pre-pass (fast)

  • curate : LLM – math → English, table summaries, ack condensing

  • synthesize_audio: TTS → audio file

Each node receives the full PipelineState and returns a partial update.

audia.agents.graph.node_extract_text(state)[source]

Extract text and basic metadata from the PDF.

Parameters:

state (PipelineState)

Return type:

dict[str, Any]

audia.agents.graph.node_preprocess(state)[source]

Heuristic regex pre-pass – fast, no LLM.

Parameters:

state (PipelineState)

Return type:

dict[str, Any]

audia.agents.graph.node_curate(state)[source]

LLM curation: math → English, table summaries, ack condensing.

Parameters:

state (PipelineState)

Return type:

dict[str, Any]

audia.agents.graph.node_synthesize_audio(state)[source]

Convert the final curated text to an audio file via TTS.

Parameters:

state (PipelineState)

Return type:

dict[str, Any]

audia.agents.graph.build_pipeline()[source]

Compile and return the LangGraph CompiledGraph.

Return type:

Any

audia.agents.graph.run_pipeline(pdf_path, output_dir=None)[source]

Convenience function: build and run the pipeline for a single PDF.

Returns the final PipelineState dict. After each run, the three text stages are saved to ~/.audia/debug/<stem>_<YYYYMMDD_HHMMSS>/ for inspection.

Parameters:
Return type:

PipelineState


audia.agents.state — Pipeline state

LangGraph state definition for the PDF → audio pipeline.

class audia.agents.state.PipelineState[source]

Bases: TypedDict

State that flows through every node of the LangGraph pipeline.

pdf_path: str
output_dir: str
raw_text: str
preprocessed_text: str
cleaned_text: str
audio_path: str | None
audio_filename: str
title: str
num_pages: int
tts_backend: str
tts_voice: str
run_id: str
error: str | None

audia.agents.pdf_processor — PDF extraction

PDF text extraction using PyMuPDF (fitz).

Handles: - Multi-page PDFs - Basic heuristic removal of headers, footers, page numbers,

references section, and acknowledgements section.

class audia.agents.pdf_processor.ExtractionResult(text, num_pages, title)[source]

Bases: NamedTuple

Parameters:
text: str

Alias for field number 0

num_pages: int

Alias for field number 1

title: str

Alias for field number 2

audia.agents.pdf_processor.extract_text(pdf_path)[source]

Extract and pre-clean text from a PDF.

Returns an ExtractionResult with cleaned text, page count, and guessed title. Raises FileNotFoundError if the PDF does not exist.

Parameters:

pdf_path (str | Path)

Return type:

ExtractionResult


audia.agents.text_cleaner — Heuristic + LLM curation

Text curation pipeline – the core intelligence of audia.

Two-stage process:
  1. heuristic_clean() – fast regex pre-pass: removes citations, LaTeX artefacts,

    collapses whitespace. Reduces LLM token cost.

  2. llm_curate() – LLM pass (ALWAYS required): rewrites math in plain English,

    summarises tables, condenses acknowledgements, ensures smooth spoken-word flow.

audia.agents.text_cleaner.heuristic_clean(text)[source]

Fast regex pre-pass – always runs before the LLM call to reduce token cost.

Parameters:

text (str)

Return type:

str

audia.agents.text_cleaner.llm_curate(text, settings=None, progress_cb=None)[source]

LLM curation pass – ALWAYS required, always runs. Raises RuntimeError on misconfiguration / missing API key.

Parameters:
  • progress_cb (callable(str) | None) – Optional callback invoked with a plain-text progress line for each chunk so callers (e.g. the web job runner) can surface per-chunk progress without parsing Rich markup.

  • text (str)

  • settings (Settings | None)

Return type:

str

audia.agents.text_cleaner.llm_clean(text, settings=None, progress_cb=None)

LLM curation pass – ALWAYS required, always runs. Raises RuntimeError on misconfiguration / missing API key.

Parameters:
  • progress_cb (callable(str) | None) – Optional callback invoked with a plain-text progress line for each chunk so callers (e.g. the web job runner) can surface per-chunk progress without parsing Rich markup.

  • text (str)

  • settings (Settings | None)

Return type:

str

audia.agents.text_cleaner.curate_text(text, settings=None)[source]

Full curation pipeline: heuristic pre-pass → LLM curation.

Transition guarantee

Each chunk (from chunk 2 onward) receives the tail of the previous curated chunk as read-only context so the LLM can write a smooth spoken transition without re-processing or re-outputting already-curated text. The full paper content is preserved as-is after the LLM pass — no content is dropped or deduplicated.

Parameters:
Return type:

str

audia.agents.text_cleaner.clean_text(text, settings=None)[source]
Parameters:
Return type:

str


audia.agents.tts — Text-to-speech synthesis

Text-to-Speech wrapper supporting multiple backends:

  • edge-tts (default, free, requires internet)

  • kokoro (local, requires: pip install audia[kokoro])

  • openai (requires API key)

All backends return the absolute path to the generated audio file.

audia.agents.tts.synthesize(text, output_dir=None, filename=None, settings=None, progress_cb=None)[source]

Convert text to an audio file and return its path.

Parameters:
  • text (The cleaned text to synthesise.)

  • output_dir (Directory for the output file. Defaults to settings.audio_dir.)

  • filename (Desired filename (without extension). Auto-generated when None.)

  • settings (Audia settings; uses global settings when None.)

Return type:

Path


audia.agents.stt — Speech-to-text + query distillation

Speech-to-Text input – record from microphone and transcribe.

audia.agents.stt.record_and_transcribe(seconds=30, samplerate=16000, model_size='base', device='cpu')[source]

Record audio from the default microphone and return the transcription.

Parameters:
  • seconds (Maximum recording duration.)

  • samplerate (Audio sample rate (16 kHz is recommended for Whisper).)

  • model_size (faster-whisper model: tiny | base | small | medium | large-v3)

  • device ('cpu' or 'cuda')

Return type:

str

audia.agents.stt.transcribe_file(audio_path, model_size='base', device='cpu')[source]

Transcribe an existing audio file (wav, mp3, …).

Parameters:
Return type:

str

audia.agents.stt.distill_search_query(speech)[source]

Use the configured LLM to extract a concise ArXiv search query from raw speech.

Example

>>> distill_search_query("I would like to research about agentic AI.")
'agentic AI research'
Parameters:

speech (str)

Return type:

str


audia.agents.research — ArXiv research

ArXiv paper search and download.

Primary: arxiv Python SDK. Fallback: HTML scrape of arxiv.org/search (used when the API returns 429).

class audia.agents.research.ArxivPaper(arxiv_id, title, authors, abstract, pdf_url, published, local_pdf_path=None)[source]

Bases: object

Lightweight representation of an ArXiv result.

Parameters:
arxiv_id: str
title: str
authors: list[str]
abstract: str
pdf_url: str
published: str
local_pdf_path: str | None = None
class audia.agents.research.ArxivSearcher(max_results=None)[source]

Bases: object

Search ArXiv and download PDFs.

Parameters:

max_results (int | None)

search(query)[source]

Search ArXiv for query and return up to max_results papers.

Falls back to HTML scraping if the API returns an error (e.g. HTTP 429).

Parameters:

query (str)

Return type:

list[ArxivPaper]

download_pdf(paper, dest_dir=None)[source]

Download the PDF for paper directly from arxiv.org/pdf/<id>.

Bypasses the arxiv SDK export API entirely to avoid HTTP 429 rate-limits. Skips the download if the file already exists.

Parameters:
Return type:

Path