Changelog
CHANGELOG
Version 0.7.3 (2026-05-27)
UI rebuild — 3-way theme toggle
Replaced the single icon theme button in the header with a sliding 3-way toggle: System / Light / Dark
Active selection indicated by a sliding pill with per-mode icon colors:
Dark mode: black pill, white (system), amber-400 (sun & moon)
Light mode: blue pill, black (system), amber-600 (sun), amber-400 (moon)
ThemeModetype ('system' | 'light' | 'dark') added toApp.tsx; default isdarkSystem mode derives
isDarkfromwindow.matchMedia('(prefers-color-scheme: dark)')Rebuilt frontend bundle and copied static assets to
src/audia/ui/static
Version 0.7.2 (2026-05-27)
UI rebuild only
Rebuilt frontend bundle and copied static assets to
audia/src/audia/ui/staticNo backend or logic changes in this release — this version solely captures the updated compiled UI
Version 0.7.1 (2026-05-10)
Code quality fixes
Fixed E402 lint errors in
ui/routes/convert.pyby moving_loggerassignment to after all importsRemoved unused local variables (
result,src_dirs,mock_browser) inui/routes/library.py,tests/test_cli.py, andtests/test_text_cleaner.py(F841)Wrapped long lines exceeding 100 chars in
agents/research.py,agents/text_cleaner.py,config.py,ui/routes/convert.py, andtests/test_cli.py(E501)
Version 0.7.0 (2026-04-29)
Move studies between projects
Backend
New
POST /api/library/papers/{paper_id}/move?project=<src>endpoint inlibrary.pyAccepts
{ "target_project": "<name>" }bodyValidates target project name via
validate_project_name(); rejects same-source-and-target movesCopies the PDF to
~/.audia/<target>/uploads/and each linked audio file to~/.audia/<target>/audio/, avoiding filename collisions with a{paper_id}_prefixInserts new
Paper+AudioFilerows in the target project’s SQLite database with the updated absolute file pathsDeletes source database records (cascade-deletes audio rows) and removes the original files from disk
Returns
{ status, source_project, target_project, new_paper_id }
Path repair utility (scripts/repair_paths.py)
New
repairmode: scans a project’s DB forpdf_path/file_pathvalues that no longer exist on disk and re-matches them by filename (with and without hash prefix) against files currently in the project’suploads/andaudio/directoriesNew
recover-frommode (--recover-from <source>): migrates DB records from a source project whose file paths are broken into a target project by matching filenames against the target’s on-disk files — useful for recovering from manual file movesUsage:
.venv/bin/python scripts/repair_paths.py --project <name> [--recover-from <source>] [--dry-run]
Version 0.6.1 (2026-04-22)
UI enhancements
Version 0.5.1 (2026-04-19)
Bug fixes
PDF preview broken for saved papers after project-layer introduction
Sidebar-selected papers built their preview URL as
/api/library/pdf/{id}without the?project=query param, while the endpoint requires it to locate the file in the correct project subfolderApp.tsxnow appends?project=<name>to the sidebar preview URL (matching the existing behaviour inMain.tsx)
Renaming an audio file in the database editor only updated the display name
PATCH /api/library/audio/{id}updatedfilenamein the DB but left the MP3 on disk untouched andfile_pathstaleThe endpoint now renames the file on disk, preserving the original extension if omitted, and updates
file_pathin the same transactionAfter a successful cell save, the database table re-fetches its rows so
file_pathreflects the rename immediately, and the sidebar refreshes via the existingonConverted/refreshKeymechanism
Version 0.5.0 (2026-04-18)
Breaking change — existing data stored under
~/.audia/must be migrated manually. Move~/.audia/audia.db,~/.audia/audio/,~/.audia/uploads/, and~/.audia/debug/into~/.audia/default/to preserve your current library under the default project.
Project-based storage
All files are now namespaced by project under ~/.audia/<project>/
Previously everything was written flat to ~/.audia/. The new layout keeps projects fully isolated — each has its own SQLite database and audio/upload/debug directories.
Backend
config.py: newDEFAULT_PROJECT = "default"constant;ProjectDirsdataclass bundlesdb_path,audio_dir,upload_dir,debug_dirfor a given project root;validate_project_name()enforces lowercase-alphanumeric names;Settings.get_project_dirs(project)returns the correctProjectDirsfor any project; the legacy flat properties (db_path,audio_dir, etc.) now delegate toget_project_dirs("default")storage/database.py: per-project engine/session-factory registry (_engines,_factoriesdicts keyed by project name);init_db(project)andget_session(project)both accept an optional project argument; engines are lazily created and cachedNew
GET /api/projects— list all projects with metadata (document count, audio count, disk size, creation date)New
POST /api/projects— create a named projectNew
DELETE /api/projects/{name}— delete a project and all its files (protected: default project cannot be deleted)All existing routes (
/api/library/*,/api/convert/*,/api/research/*,/api/settings) accept an optional?project=<name>query parameter (orprojectform/body field for POST endpoints); omitting it selects the default project
CLI
audia convertgains--project/-p— output audio and database entries go to~/.audia/<project>/; defaults to~/.audia/default/--outputstill works as an explicit override on top of the active project directory
Frontend
New
DatabaseSelectorcomponent in the header — dropdown to list, create, and delete projects; active project is highlighted; destructive delete requires a confirmation stepactiveProjectstate threaded fromAppthroughHeader,Sidebar,Main,MainConvert,MainResearch, andMainDatabase; all API calls append?project=<name>when a non-default project is activeSwitching projects refreshes the sidebar library immediately
Tests
conftest.py:clear_settings_cachefixture (autouse) now setsAUDIA_DATA_DIRto pytest’stmp_pathand clears the engine/factory registry around every test — no test writes to~/.audia/any moreisolated_dbfixture rewritten to inject into the new_engines/_factoriesdicts instead of the removed module-level_engine/_SessionLocalattributestest_config.py: path assertions updated to reflect the newdata_dir/default/layout
Version 0.4.4 (2026-04-02)
Style: docs & frontend
Docs tables: darker cell borders so text is readable in both light and dark themes
Docs sidebar: expanded section background changed from RTD default light-gray to the theme dark background
Docs buttons (prev/next): purple background with white text; no outline on focus/click
Docs code blocks: border removed; left accent changed from purple to lime
Docs links: underline removed on hover and active states
Configuration diagram: fixed node ordering in the pipeline flow
Configuration diagram dropdown: added backdrop blur so the list is legible over the SVG
Tests: coverage 68% → 96%
New
test_stt.py: covers_ensure_stt_deps,transcribe_file,_transcribe_array,record_and_transcribe(includingKeyboardInterruptpath), anddistill_search_queryNew
test_async_jobs.py: directlyawaits_run_research_job(success, not-found, cancellation, exception, no-query) and exercises theenqueue_conversionbackground task viahttpx.AsyncClientExtended
test_text_cleaner.py: Google LLM provider (import error, missing key, happy path, customapi_base); OpenAI/Anthropic happy paths withapi_base;progress_cbpath inllm_curate;clean_textaliasExtended
test_research.py: HTML fallback search (parsing,max_results, empty page, API-error trigger, 429 trigger);paper.published = Noneedge case; HTTP error on downloadExtended
test_cli.py: “all” paper selection, out-of-range selection, download failure with manual path fallback,--openflag, no-subcommand invocation
Version 0.4.3 (2026-04-01)
Version 0.4.2 (2026-04-01)
Bug fixes & UI improvements
LLM provider ignored during query normalisation
Root cause:
NormalizeRequestonly had aqueryfield; thellm_providerandllm_modelvalues sent by the frontend were silently dropped, sodistill_search_queryalways fell back to the global.envdefaults (Anthropic)NormalizeRequestnow acceptsllm_providerandllm_model; the/api/research/normalizehandler applies them to the settings object before building the LLM — consistent with how_run_research_jobalready handled overridesThe handler now inlines the LLM call directly (no longer delegates to
distill_search_query) so provider/model overrides are guaranteed to take effect
Research sessions not saved to the database
Root cause:
EnqueueRequesthad noqueryfield so there was nothing to store;_run_research_jobsavedPaperandAudioFilerows but never wrote aResearchSessionEnqueueRequestand_run_research_jobnow accept an optionalqueryparameterAfter each job saves the paper, a
ResearchSessionrow is written with the search query and the newpaper_idFrontend updated to include
query: normalizedQuery ?? queryin the enqueue payload
Database tab colour scheme
audio_filescard/heading: violet → limeuser_settingscard/heading: amber → purpleAmber removed from the colour palette entirely; lime and purple added
Version 0.4.1 (2026-04-01)
Set the display div for multiline cells to max-h-24 overflow-y-auto, so abstract (and query) cells will be capped at 6rem tall and scroll vertically when the content overflows
Version 0.4.0 (2026-04-01)
Bug fixes
Abstract column not editable in Database tab
Root cause: PDF-converted papers have
abstract=""(empty string), which rendered as a zero-height invisible<span>with only 2 px of padding — effectively unclickableEmpty editable cells now display a dimmed italic
(empty)placeholder so they are always visible and clickableEditableCellwrapper gainedmin-h-[1.25rem]so the click target is never zero-heightCellValuenow receivesisEditableandisDarkprops to render the placeholder with correct theme colouring
TTS voice not persisted to user settings
Root cause:
tts_voicewas missing from both_DEFAULTSandSettingsBodyinsettings.py, so the voice selection was silently dropped on every save and never loaded on startuptts_voiceadded to_DEFAULTS(default:en-US-AriaNeural) andSettingsBodyin the settings routertts_voiceis now forwarded end-to-end:MainConvertandMainResearcheach gained attsVoiceprop;Main.tsxpasses the loaded voice to both; the convert form sends it asvoice, and the research enqueue JSON sends it astts_voice;EnqueueRequestand_run_research_jobinresearch.pylikewise accept and apply the new field
Debug text files not written in UI mode
The debug save block in
convert.pynow resolvesdebug_dirvia a freshget_settings()call (not via the mutatedcfg2alias) and explicitly creates the parent directory before the run subdirectoryText values are guarded with
or ""to preventwrite_text(None)errorsErrors are now logged to the server console via
logging.warning(..., exc_info=True)in addition to the job log, so failures are visible even without expanding the progress log in the UI
Version 0.3.9 (2026-03-31)
Bug fixes & UX improvements
Browser caching of stale UI after pip upgrade (#1)
index.htmlis now served withCache-Control: no-cache, no-store, must-revalidateso browsers always revalidate after a package update; hashed JS/CSS bundles generated by Vite remain cacheable as before
Audio playback broken after renaming a paper (#3)
Root cause:
GET /api/library/audiowas not returningdownload_url, so sidebar audio entries hadundefinedas their playback URLdownload_url(/api/convert/download/{id}) is now included in every audio record returned by the list endpoint; playback works regardless of any title or filename edits
Tab switch clears convert / research progress (#4)
All four tab panels (
configuration,convert,research,database) are now permanently mounted and toggled withdisplay: noneinstead of being conditionally rendered; React state — uploaded file, job ID, live progress — is preserved when switching away and back
Long paper titles overflow the convert drop zone (#6)
Added
break-words whitespace-normalto the filename display inside the drop zone so long titles wrap instead of overflowing in all browsersSame
break-wordstreatment applied to the post-conversion success message
Debug text files not written in UI mode (#7)
_save_debug_textswas only called from the synchronousrun_pipelinepath, which the web UI never usesThe
enqueue_conversionbackground task now saves1_raw.txt,2_preprocessed.txt, and3_curated.txtto~/.audia/debug/<stem>_<timestamp>/after every successful conversion, matching CLI behaviour
Version 0.3.8 (2026-03-31)
Rebuild UI with the current state of implementation
Version 0.3.7 (2026-03-31)
Fix browser opening before server is ready
Replaced the hardcoded
time.sleep(1.2)delay inaudia servewith a TCP poll loop that opens the browser only once the server is actually accepting connections (checks every 200 ms, 30 s timeout)Eliminates the “unable to reach” error seen on first load when uvicorn had not finished starting within the fixed delay
Version 0.3.6 (2026-03-31)
Google Gemini LLM support
AUDIA_LLM_PROVIDER=google— new provider option backed bylangchain-google-genai/ChatGoogleGenerativeAIAUDIA_GOOGLE_API_KEY— required when using the Google providerAUDIA_GOOGLE_API_BASE— optional custom endpoint (Vertex AI or corporate proxy); mapped toclient_options.api_endpointllm_providerLiteral extended to"openai" | "anthropic" | "google"inconfig.py; validator error message updated accordingly_build_llmintext_cleaner.pygains agooglebranch with clear import-error and missing-key messageslangchain-google-genai>=2.0andgoogle-generativeai>=0.8added to core dependencies inpyproject.toml.env.exampleupdated with an “Option C – Google Gemini” block documentinggemini-2.0-flash,gemini-2.0-flash-lite, andgemini-1.5-proas example models
Version 0.3.5 (2026-03-31)
Rebuild frontend
Rebuild UI with the current state of implementation
Add new db tab
Update the HTML title
Version 0.3.4 (2026-03-31)
Custom API base URLs for OpenAI and Anthropic
AUDIA_OPENAI_API_BASE— optional setting to redirect all OpenAI calls (LLM + TTS) to a custom endpoint (Azure OpenAI, corporate proxy, or any OpenAI-compatible URL)AUDIA_ANTHROPIC_API_BASE— same for Anthropic LLM callsBoth settings wired through
config.py,text_cleaner.py(_build_llm), andtts.py;stt.pyinherits the base URL automatically via_build_llm.env.exampleupdated with commented examples for both options
Version 0.3.3 (2026-03-31)
Icon system, TTS voice selector & database editor
Icon system (constants/index.ts)
Introduced
IconDefdiscriminated union type —{ kind: 'icon'; name; adaptive? }for Iconify icons or{ kind: 'img'; src; alt }for image assets — providing a single typed representation for all logos across the appAll asset imports (
arxiv.svg,systran.svg,hexgrad.webp) moved from individual component files intoconstants/index.tsPROVIDER_ICONSupdated toRecord<LLMProvider, IconDef>; OpenAI usessimple-icons:openaiwithadaptive: true(renders white in dark mode, black in light mode)New exports:
STT_ICON,ARXIV_ICON,TTS_BACKEND_ICONS— centralising all service/backend logos in one placerenderIconDef(icon, isDark, className?)helper inMainConfiguration.tsxhandles bothimgandiconvariants, applying theme-aware colouring for adaptive icons
TTS voice selector (Configuration tab)
TTS card now shows a 2-column grid: Engine selector + Voice selector side-by-side
Voice options are populated from
TTS_VOICES[backend]; switching the engine automatically resets the voice to the first option for that backendttsVoicestate added toMain.tsx, persisted to and loaded from/api/settingsvia a newtts_voicekey
Database explorer — editable cells
All cells in editable columns (
title,authors,abstract,arxiv_id,pdf_urlfor papers;filename,duration_seconds,tts_backend,tts_voice,paper_idfor audio files;queryfor research sessions;valuefor user settings) are now inline-editableClick a cell →
<input>or<textarea>(forabstract/query) appear inline; Enter commits, Escape cancels, ⌘/Ctrl+Enter commits multiline fields; blur also commitstts_backendandtts_voicecells use a portal-based custom dropdown that escapes theoverflow-x-autocontainer;tts_voiceoptions reflect the current row’s backendBrief lime flash on successful save; spinner while saving
authorsis edited as comma-separated text and sent as a JSON array;paper_id/duration_secondsare coerced to numbers; clearing nullable fields sendsnullNew backend PATCH endpoints for all four tables:
PATCH /api/library/papers/{id},PATCH /api/library/audio/{id},PATCH /api/library/research_sessions/{id},PATCH /api/library/user_settings/{key}
Database explorer — table UX
Table is now horizontally scrollable — uses
table-auto min-w-maxso columns never line-wrap;overflow-x-autowrapper scrolls insteadFull column sets returned:
list_papersnow includesabstract,pdf_path,pdf_url;list_audionow includesfile_path,duration_seconds(previously omitted)Column hide/show: each column header has an eye-off icon; clicking hides the column; hidden columns appear as chips above the table that restore on click; hidden set resets when switching tables
Table picker replaced native
<select>with a fully custom styled dropdownClicking a
papers.idcell (oraudio_files.paper_id) opens the PDF in the PreviewPanel — links rendered as rose-coloured buttons instead of editable cells
Version 0.3.2 (2026-03-30)
Convert tab improvements
Instant PDF preview on upload
Dropping a PDF or selecting one via the file browser immediately opens the PreviewPanel on the right using a local
blob:URL — no conversion needs to start firstClearing the file (“Convert another file”) also closes the preview panel
During conversion the panel URL is transparently upgraded to the server-served PDF once the backend has processed it
Collapsible full-progress log
The verbose log output (step-by-step stage details, chunk counts, etc.) is now hidden by default behind a “Show full progress” toggle
Clicking the toggle expands/collapses the log with a chevron indicator (“Show full progress” ▶ / “Hide full progress” ▼)
The log container background contrast raised (
bg-white/8/bg-black/8) so the scrolling area is visually distinct
Version 0.3.1 (2026-03-30)
Frontend refactor & database explorer
Main.tsx split into tab components
MainConfiguration.tsx: pipeline diagram and model/backend selectors extracted into a standalone componentMainConvert.tsx: PDF upload,ConversionProgress,AudioPlayer, andCONVERT_STAGESextracted; all three are exported for reuseMainResearch.tsx: ArXiv search, voice recording, LLM query normalisation, paper selection, and job progress extracted; exportsRESEARCH_STAGESMain.tsxreduced to a 159-line orchestrator: holds shared pipeline config state, loads/saves settings via/api/settings, renders a tab bar, and delegates to each sub-component
Database tab (MainDatabase.tsx)
Schema overview: four cards (one per table —
papers,audio_files,research_sessions,user_settings) showing all columns with types and PK/FK badges; relationship annotations (FK arrow and JSON logical link) displayed belowMermaid ERD: collapsible
erDiagramcode block with a one-click copy buttonData explorer: dropdown to select any table; fetches all rows from the API and renders them in a full-width table with no truncation; JSON array columns rendered inline
API additions (library router)
GET /api/library/research_sessions— returns all research sessions ordered by dateGET /api/library/user_settings— returns all key-value settings rows
scripts/explore_db.py
Standalone stdlib-only script (no dependencies) that prints a full terminal dump of
~/.audia/audia.dbShows table list, per-table schema (column names, types, nullability, PK), foreign keys, row count, and all cell values with JSON pretty-printing
Accepts
--db /path/to/other.dbto point at a custom database file
Version 0.3.0 (2026-03-29)
Configuration, voice search & LLM query normalisation
Configuration tab
Pipeline diagram: animated SVG diagram in the Configuration tab visualises the full pipeline (STT → LLM → ArXiv → PDF → TTS) with correct arrow routing (Text bypasses LLM and enters ArXiv directly)
Persistent settings:
UserSettingSQLite model stores per-key config;GET /api/settingsandPUT /api/settingsendpoints load and upsert settings on demandSave button: “Save configuration” button with “Saved ✓” confirmation; config state lifted to
Mainand loaded from the database on mountSettings applied to pipelines: selected LLM provider/model and TTS backend are forwarded as optional overrides to
/api/convert/enqueueand/api/research/enqueue
Voice search (Research tab)
Microphone button: browser
MediaRecordercaptures audio; blob isPOSTed to/api/research/transcribewhich runsfaster-whisper(transcribe_file) and returns the transcript into the query field
LLM query normalisation (Research tab)
Normalize query button: calls
POST /api/research/normalizewhich runsdistill_search_queryfromstt.py(same prompt used by the CLIlistencommand — no duplicate prompt logic)Two-step search flow: raw query → optional LLM normalisation → editable confirmation pane → “Search arXiv” button; normalisation errors surfaced as a dismissable red banner instead of silent fallback
Single Search arXiv button: removed duplicate search button; one full-width button below the input row handles both direct and post-normalisation searches; “No results found” only shown after an actual search attempt
API additions
POST /api/research/normalize— wrapsdistill_search_queryPOST /api/research/transcribe— wrapstranscribe_filePOST /api/convert/upload— synchronous PDF upload + convert (used by tests)POST /api/research/convert— synchronous ArXiv download + convert (used by tests)
Code hygiene
Removed
_QUERY_SYSTEMprompt andnormalize_query()fromtext_cleaner.py; normalisation now reusesdistill_search_querydirectly — two LLMs, two prompts, no duplication
Version 0.2.0 (2026-03-29)
Web UI overhaul
Async conversion jobs: PDF upload and ArXiv research conversions now run in the background; a
job_idis returned immediately and the frontend polls for live progressStreaming terminal log: each pipeline stage (PDF extraction, heuristic cleaning, LLM curation chunk-by-chunk, TTS chunk-by-chunk) streams log lines into a scrollable terminal pane
Cancel button: any running conversion can be cancelled mid-pipeline via a cancel button that calls
DELETE /api/{convert,research}/jobs/{id}Inline PDF preview: clicking a paper in the sidebar now opens the PDF in a side panel instead of downloading it;
Content-Disposition: inlineis set on all PDF-serving endpointsLive PDF preview during conversion: the preview panel opens automatically as soon as the PDF is available (right after upload or ArXiv download), before the pipeline finishes
Research async pipeline:
POST /api/research/enqueuereplaces the old blocking/convert; each ArXiv ID gets its own job with 6 stages (searching → downloading → extracting → pre-cleaning → LLM curation → TTS synthesis)Progress callbacks:
llm_curateandsynthesize/_edge_ttsaccept aprogress_cbparameter used by the web job runner to emit per-chunk log linesShared job store:
audia.ui.jobs.JOBSdict is imported by bothconvert.pyandresearch.pyrouters so cancel and PDF-serve endpoints work across both flowsUI fixes: “Convert another file” reset button only appears after conversion completes; PreviewPanel refactored to accept
title/pdfUrldirectly instead of aPaperobject
Version 0.1.2 (2026-03-29)
ArXiv search robustness & CLI improvements
ArXiv search
HTTP fallback: when the ArXiv export API returns HTTP 429, the search automatically retries via HTML scraping of
arxiv.org/search— no user action required; a short one-line warning is shown instead of a full tracebackDate from arxiv ID: publication date is now derived from the paper ID (
YYMMprefix, e.g.2603→Mar 2026) instead of unreliable HTML scraping
PDF download
SDK-free download: PDFs are now fetched directly from
https://arxiv.org/pdf/<id>viaurllib, bypassing the export API entirely and eliminating 429 rate-limit failures on downloadManual fallback prompt: if a download still fails, the user is shown the
arxiv.org/abs/link and prompted to provide a local PDF path to continue conversion
CLI output
Results table: added a Link column (
https://arxiv.org/abs/<id>) withno_wrap=True— URLs are never truncated so they remain clickableASCII banner: running
audiawith no arguments now displays a block-art banner before the help textPaper selection always prompted:
audia listenandaudia researchboth show the results table and ask the user to pick papers;--convertflag still skips the prompt for power users
Version 0.1.1 (2026-03-29)
listen command pipeline overhaul
LLM query distillation: raw transcribed speech is now passed through the configured LLM to extract a concise ArXiv search query (e.g. “I would like to research agentic AI” → “agentic AI research”)
distill_search_query()moved toaudia.agents.stt(proper agent layer, not CLI)Confirmation loop before searching: after distillation the user sees the extracted query and can confirm (
y), re-record (r), or quit (q) — prevents accidental searches from mis-transcriptionstyper.confirm()replaced with explicittyper.prompt()to support the three-way choice
Version 0.1.0 (2026-03-29)
First fully working release: PDF to audio via mandatory LLM curation and edge-tts synthesis.
Linear LangGraph pipeline: PDF extraction → heuristic clean → LLM curation → TTS synthesis
LLM curation is mandatory; supports OpenAI and Anthropic backends via
AUDIA_LLM_PROVIDERChunk-level LLM curation with tail-context stitching for smooth spoken transitions
edge-tts default TTS backend: async with 90 s per-chunk timeout and asyncio context fix for FastAPI
Rich per-step progress output;
audia --versionflagConsistent run ID (
<pdf_stem>_<YYYYMMDD_HHMMSS>) shared by audio filename and debug folderDebug text snapshots saved to
~/.audia/debug/<run_id>/(raw, preprocessed, curated)SQLite storage for papers and audio files via SQLAlchemy
FastAPI web UI (SPA) and Typer CLI with
convert,research,listen,serve,infocommands.env.examplewith full documentation of allAUDIA_*settings
Version 0.0.1 (2026-03-28)
First release of the basic package structure