[frontend/multimodal] Add mic button to chat bar for direct audio input to multimodal models (Gemma 4 12B) #825
Labels
No labels
area:chat
area:core
area:llm
area:routes
area:tools
bug
documentation
duplicate
enhancement
good first issue
help wanted
invalid
question
refactor
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
sleepy/odysseus#825
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Context
Gemma 4 12B supports native audio input — audio is passed directly to the model (no separate STT transcription step). The model accepts audio via the OpenAI-compatible
input_audiocontent type:The frontend already has
static/js/voiceRecorder.jswith full MediaRecorder infrastructure, but it currently only supports STT transcription modes (browser/local/endpoint). When a multimodal model like Gemma 4 12B is selected, the mic button should send the raw audio directly to the model as aninput_audiocontent block instead of transcribing it first.Requirements
1. Mic button in chat bar
2. Direct audio passthrough mode
audio_input: truecapability:input_audiocontent block in the chat message alongside any text3. Frontend message format
chat_streamendpoint currently accepts FormData withmessage+attachmentsaudio_data(base64-encoded) andaudio_format(e.g. "wav", "webm") to the FormData4. Audio format considerations
MediaRecorderwithaudio/wavMIME type where supported, or convert WebM → WAV via AudioContext before base64 encodingFiles to modify
static/js/voiceRecorder.js— add direct audio mode alongside existing STT modesstatic/js/chat.js— handle audio attachment in message submission, add mic button to UIstatic/index.html— mic button HTML element in chat barstatic/css/*.css— mic button stylingAcceptance criteria
input_audiocontent