audia

Getting Started

  • Installation
    • Requirements
    • Standard install
    • Recommended: pipx
    • Optional extras
    • Install from source
    • Verify
  • Quick Start
    • 1. Configure
    • 2. Convert a local PDF
    • 3. Search ArXiv and convert
    • 4. Voice input
    • 5. Web UI
    • 6. Show active config
  • Configuration
    • LLM provider (required)
      • Option A — OpenAI
      • Option B — Anthropic
      • Shared LLM settings
    • TTS backend
    • STT (voice input)
    • Storage
    • Web server
    • Research
    • Full .env.example

Guides

  • Pipeline
    • Entry points
    • Step 1 — PDF extraction (pdf_processor.py)
    • Step 2 — Heuristic pre-pass (text_cleaner.py)
    • Step 3 — LLM curation (text_cleaner.py — llm_curate)
    • Step 4 — TTS synthesis (tts.py)
    • State (state.py)
    • Debug output
  • CLI Reference
    • audia convert
    • audia research
    • audia listen
    • audia serve
    • audia info
    • audia --version
  • Web UI
    • Starting the server
    • Tabs
      • Convert
      • Research
      • Configuration
      • Library (Database)
    • API endpoints (summary)
  • TTS Backends
    • edge-tts (default)
    • Kokoro (local)
    • OpenAI TTS
    • Chunk size tuning
  • LLM Backends
    • OpenAI
      • Custom endpoint
    • Anthropic
      • Custom endpoint
    • Shared settings
    • How the LLM is used
  • Storage
    • Location
    • Schema
      • papers
      • audio_files
      • research_sessions
      • user_settings
    • Exploring the database

API Reference

  • audia — top-level package
    • Settings
      • Settings.model_config
      • Settings.server_host
      • Settings.server_port
      • Settings.reload
      • Settings.data_dir
      • Settings.get_project_dirs()
      • Settings.db_path
      • Settings.audio_dir
      • Settings.upload_dir
      • Settings.debug_dir
      • Settings.ensure_dirs()
      • Settings.llm_provider
      • Settings.openai_api_key
      • Settings.openai_api_base
      • Settings.anthropic_api_key
      • Settings.anthropic_api_base
      • Settings.google_api_key
      • Settings.google_api_base
      • Settings.llm_model
      • Settings.llm_temperature
      • Settings.llm_max_chunk_chars
      • Settings.tts_backend
      • Settings.tts_voice
      • Settings.tts_rate
      • Settings.tts_chunk_chars
      • Settings.stt_model
      • Settings.stt_device
      • Settings.stt_record_seconds
      • Settings.arxiv_max_results
    • get_settings()
  • audia.config — Settings
    • validate_project_name()
    • ProjectDirs
      • ProjectDirs.root
      • ProjectDirs.db_path
      • ProjectDirs.audio_dir
      • ProjectDirs.upload_dir
      • ProjectDirs.debug_dir
      • ProjectDirs.ensure_dirs()
    • Settings
      • Settings.model_config
      • Settings.server_host
      • Settings.server_port
      • Settings.reload
      • Settings.data_dir
      • Settings.get_project_dirs()
      • Settings.db_path
      • Settings.audio_dir
      • Settings.upload_dir
      • Settings.debug_dir
      • Settings.ensure_dirs()
      • Settings.llm_provider
      • Settings.openai_api_key
      • Settings.openai_api_base
      • Settings.anthropic_api_key
      • Settings.anthropic_api_base
      • Settings.google_api_key
      • Settings.google_api_base
      • Settings.llm_model
      • Settings.llm_temperature
      • Settings.llm_max_chunk_chars
      • Settings.tts_backend
      • Settings.tts_voice
      • Settings.tts_rate
      • Settings.tts_chunk_chars
      • Settings.stt_model
      • Settings.stt_device
      • Settings.stt_record_seconds
      • Settings.arxiv_max_results
    • get_settings()
  • audia.agents — Pipeline agents
    • audia.agents.graph — LangGraph pipeline
      • node_extract_text()
      • node_preprocess()
      • node_curate()
      • node_synthesize_audio()
      • build_pipeline()
      • run_pipeline()
    • audia.agents.state — Pipeline state
      • PipelineState
        • PipelineState.pdf_path
        • PipelineState.output_dir
        • PipelineState.raw_text
        • PipelineState.preprocessed_text
        • PipelineState.cleaned_text
        • PipelineState.audio_path
        • PipelineState.audio_filename
        • PipelineState.title
        • PipelineState.num_pages
        • PipelineState.tts_backend
        • PipelineState.tts_voice
        • PipelineState.run_id
        • PipelineState.error
    • audia.agents.pdf_processor — PDF extraction
      • ExtractionResult
        • ExtractionResult.text
        • ExtractionResult.num_pages
        • ExtractionResult.title
      • extract_text()
    • audia.agents.text_cleaner — Heuristic + LLM curation
      • heuristic_clean()
      • llm_curate()
      • llm_clean()
      • curate_text()
      • clean_text()
    • audia.agents.tts — Text-to-speech synthesis
      • synthesize()
    • audia.agents.stt — Speech-to-text + query distillation
      • record_and_transcribe()
      • transcribe_file()
      • distill_search_query()
    • audia.agents.research — ArXiv research
      • ArxivPaper
        • ArxivPaper.arxiv_id
        • ArxivPaper.title
        • ArxivPaper.authors
        • ArxivPaper.abstract
        • ArxivPaper.pdf_url
        • ArxivPaper.published
        • ArxivPaper.local_pdf_path
      • ArxivSearcher
        • ArxivSearcher.search()
        • ArxivSearcher.download_pdf()
  • audia.storage — Database models & access
    • audia.storage.models — SQLAlchemy ORM models
      • Base
      • Paper
        • Paper.audio_files
        • Paper.authors_list
      • AudioFile
        • AudioFile.paper
      • ResearchSession
        • ResearchSession.paper_ids_list
      • UserSetting
    • audia.storage.database — Session factory & helpers
      • engine()
      • init_db()
      • get_session()
  • audia.cli — Command-line interface
    • Commands
    • convert()
    • research()
    • listen()
    • serve()
    • info()
  • audia.ui — Web UI (FastAPI)
    • audia.ui.app — FastAPI application
      • create_app()
    • audia.ui.jobs — Background job store
    • Routes
      • audia.ui.routes.convert
        • upload_and_convert()
        • enqueue_conversion()
        • get_job_status()
        • cancel_job()
        • serve_job_pdf()
        • download_audio()
      • audia.ui.routes.research
        • SearchRequest
        • NormalizeRequest
        • ConvertResearchRequest
        • EnqueueRequest
        • normalize()
        • search()
        • convert_papers()
        • enqueue_research()
        • transcribe_audio()
        • get_job_status()
        • cancel_job()
        • serve_job_pdf()
      • audia.ui.routes.library
        • list_papers()
        • list_audio()
        • list_research_sessions()
        • list_user_settings()
        • PaperPatch
        • AudioPatch
        • ResearchSessionPatch
        • UserSettingPatch
        • patch_paper()
        • patch_audio()
        • patch_research_session()
        • patch_user_setting()
        • delete_audio()
        • get_paper()
        • delete_paper()
        • serve_pdf()
        • MovePaperBody
        • move_paper()
      • audia.ui.routes.settings
        • SettingsBody
        • get_ui_settings()
        • save_ui_settings()

Project

  • Changelog
  • CHANGELOG
    • Version 0.7.3 (2026-05-27)
      • UI rebuild — 3-way theme toggle
    • Version 0.7.2 (2026-05-27)
      • UI rebuild only
    • Version 0.7.1 (2026-05-10)
      • Code quality fixes
    • Version 0.7.0 (2026-04-29)
      • Move studies between projects
        • Backend
        • Path repair utility (scripts/repair_paths.py)
      • Sidebar UI overhaul
        • Project selector moved to sidebar
        • Project selector styling: purple → cyan
        • Move button in paper rows
        • Custom tooltips (Tooltip.tsx)
    • Version 0.6.1 (2026-04-22)
      • UI enhancements
        • Animated music visualizer in footer
    • Version 0.5.1 (2026-04-19)
      • Bug fixes
        • PDF preview broken for saved papers after project-layer introduction
        • Renaming an audio file in the database editor only updated the display name
    • Version 0.5.0 (2026-04-18)
      • Project-based storage
        • Backend
        • CLI
        • Frontend
        • Tests
    • Version 0.4.4 (2026-04-02)
      • Style: docs & frontend
      • Tests: coverage 68% → 96%
    • Version 0.4.3 (2026-04-01)
      • Research tab: convert button improvements
    • Version 0.4.2 (2026-04-01)
      • Bug fixes & UI improvements
        • LLM provider ignored during query normalisation
        • Research sessions not saved to the database
        • Database tab colour scheme
    • Version 0.4.1 (2026-04-01)
    • Version 0.4.0 (2026-04-01)
      • Bug fixes
        • Abstract column not editable in Database tab
        • TTS voice not persisted to user settings
        • Debug text files not written in UI mode
    • Version 0.3.9 (2026-03-31)
      • Bug fixes & UX improvements
        • Browser caching of stale UI after pip upgrade (#1)
        • Convert tab: button stays visible during conversion (#2)
        • Audio playback broken after renaming a paper (#3)
        • Tab switch clears convert / research progress (#4)
        • Long paper titles overflow the convert drop zone (#6)
        • Debug text files not written in UI mode (#7)
    • Version 0.3.8 (2026-03-31)
    • Version 0.3.7 (2026-03-31)
      • Fix browser opening before server is ready
    • Version 0.3.6 (2026-03-31)
      • Google Gemini LLM support
    • Version 0.3.5 (2026-03-31)
      • Rebuild frontend
    • Version 0.3.4 (2026-03-31)
      • Custom API base URLs for OpenAI and Anthropic
    • Version 0.3.3 (2026-03-31)
      • Icon system, TTS voice selector & database editor
        • Icon system (constants/index.ts)
        • TTS voice selector (Configuration tab)
        • Database explorer — editable cells
        • Database explorer — table UX
    • Version 0.3.2 (2026-03-30)
      • Convert tab improvements
        • Instant PDF preview on upload
        • Collapsible full-progress log
    • Version 0.3.1 (2026-03-30)
      • Frontend refactor & database explorer
        • Main.tsx split into tab components
        • Database tab (MainDatabase.tsx)
        • API additions (library router)
        • scripts/explore_db.py
    • Version 0.3.0 (2026-03-29)
      • Configuration, voice search & LLM query normalisation
        • Configuration tab
        • Voice search (Research tab)
        • LLM query normalisation (Research tab)
        • API additions
        • Code hygiene
    • Version 0.2.0 (2026-03-29)
      • Web UI overhaul
    • Version 0.1.2 (2026-03-29)
      • ArXiv search robustness & CLI improvements
        • ArXiv search
        • PDF download
        • CLI output
    • Version 0.1.1 (2026-03-29)
      • listen command pipeline overhaul
    • Version 0.1.0 (2026-03-29)
      • First fully working release: PDF to audio via mandatory LLM curation and edge-tts synthesis.
    • Version 0.0.1 (2026-03-28)
audia
  • Search


© Copyright 2026, Yauheniya Varabyova.