audia.agents — Pipeline agents
The agents module contains the LangGraph pipeline nodes and supporting functions.
audia.agents.graph — LangGraph pipeline
LangGraph pipeline: PDF → extracted text → curated text → audio file.
- Graph structure (linear – no optional steps):
extract_text ─► preprocess ─► curate ─► synthesize_audio ─► END
extract_text : PyMuPDF → raw text + metadata
preprocess : heuristic regex pre-pass (fast)
curate : LLM – math → English, table summaries, ack condensing
synthesize_audio: TTS → audio file
Each node receives the full PipelineState and returns a partial update.
- audia.agents.graph.node_extract_text(state)[source]
Extract text and basic metadata from the PDF.
- Parameters:
state (PipelineState)
- Return type:
- audia.agents.graph.node_preprocess(state)[source]
Heuristic regex pre-pass – fast, no LLM.
- Parameters:
state (PipelineState)
- Return type:
- audia.agents.graph.node_curate(state)[source]
LLM curation: math → English, table summaries, ack condensing.
- Parameters:
state (PipelineState)
- Return type:
- audia.agents.graph.node_synthesize_audio(state)[source]
Convert the final curated text to an audio file via TTS.
- Parameters:
state (PipelineState)
- Return type:
- audia.agents.graph.build_pipeline()[source]
Compile and return the LangGraph CompiledGraph.
- Return type:
- audia.agents.graph.run_pipeline(pdf_path, output_dir=None)[source]
Convenience function: build and run the pipeline for a single PDF.
Returns the final PipelineState dict. After each run, the three text stages are saved to ~/.audia/debug/<stem>_<YYYYMMDD_HHMMSS>/ for inspection.
- Parameters:
- Return type:
audia.agents.state — Pipeline state
LangGraph state definition for the PDF → audio pipeline.
audia.agents.pdf_processor — PDF extraction
PDF text extraction using PyMuPDF (fitz).
Handles: - Multi-page PDFs - Basic heuristic removal of headers, footers, page numbers,
references section, and acknowledgements section.
- class audia.agents.pdf_processor.ExtractionResult(text, num_pages, title)[source]
Bases:
NamedTuple
audia.agents.text_cleaner — Heuristic + LLM curation
Text curation pipeline – the core intelligence of audia.
- Two-stage process:
- heuristic_clean() – fast regex pre-pass: removes citations, LaTeX artefacts,
collapses whitespace. Reduces LLM token cost.
- llm_curate() – LLM pass (ALWAYS required): rewrites math in plain English,
summarises tables, condenses acknowledgements, ensures smooth spoken-word flow.
- audia.agents.text_cleaner.heuristic_clean(text)[source]
Fast regex pre-pass – always runs before the LLM call to reduce token cost.
- audia.agents.text_cleaner.llm_curate(text, settings=None, progress_cb=None)[source]
LLM curation pass – ALWAYS required, always runs. Raises RuntimeError on misconfiguration / missing API key.
- audia.agents.text_cleaner.llm_clean(text, settings=None, progress_cb=None)
LLM curation pass – ALWAYS required, always runs. Raises RuntimeError on misconfiguration / missing API key.
- audia.agents.text_cleaner.curate_text(text, settings=None)[source]
Full curation pipeline: heuristic pre-pass → LLM curation.
Transition guarantee
Each chunk (from chunk 2 onward) receives the tail of the previous curated chunk as read-only context so the LLM can write a smooth spoken transition without re-processing or re-outputting already-curated text. The full paper content is preserved as-is after the LLM pass — no content is dropped or deduplicated.
audia.agents.tts — Text-to-speech synthesis
Text-to-Speech wrapper supporting multiple backends:
edge-tts (default, free, requires internet)
kokoro (local, requires: pip install audia[kokoro])
openai (requires API key)
All backends return the absolute path to the generated audio file.
- audia.agents.tts.synthesize(text, output_dir=None, filename=None, settings=None, progress_cb=None)[source]
Convert text to an audio file and return its path.
- Parameters:
text (The cleaned text to synthesise.)
output_dir (Directory for the output file. Defaults to settings.audio_dir.)
filename (Desired filename (without extension). Auto-generated when None.)
settings (Audia settings; uses global settings when None.)
- Return type:
audia.agents.stt — Speech-to-text + query distillation
Speech-to-Text input – record from microphone and transcribe.
- audia.agents.stt.record_and_transcribe(seconds=30, samplerate=16000, model_size='base', device='cpu')[source]
Record audio from the default microphone and return the transcription.
- Parameters:
seconds (Maximum recording duration.)
samplerate (Audio sample rate (16 kHz is recommended for Whisper).)
model_size (faster-whisper model: tiny | base | small | medium | large-v3)
device ('cpu' or 'cuda')
- Return type:
audia.agents.research — ArXiv research
ArXiv paper search and download.
Primary: arxiv Python SDK. Fallback: HTML scrape of arxiv.org/search (used when the API returns 429).
- class audia.agents.research.ArxivPaper(arxiv_id, title, authors, abstract, pdf_url, published, local_pdf_path=None)[source]
Bases:
objectLightweight representation of an ArXiv result.
- Parameters:
- class audia.agents.research.ArxivSearcher(max_results=None)[source]
Bases:
objectSearch ArXiv and download PDFs.
- Parameters:
max_results (int | None)
- search(query)[source]
Search ArXiv for query and return up to max_results papers.
Falls back to HTML scraping if the API returns an error (e.g. HTTP 429).
- Parameters:
query (str)
- Return type:
- download_pdf(paper, dest_dir=None)[source]
Download the PDF for paper directly from arxiv.org/pdf/<id>.
Bypasses the arxiv SDK export API entirely to avoid HTTP 429 rate-limits. Skips the download if the file already exists.
- Parameters:
paper (ArxivPaper)
- Return type: