Utilora

Free Video Captioner & Audiogram Maker Online

Add AI-generated captions to videos or create audiograms from audio files — all in your browser using Whisper. TikTok-style, classic, karaoke, and minimal presets. No upload, no server.

runs locally — nothing leaves your browser

What is Free Video Captioner & Audiogram Maker Online?

Video Captioner & Audiogram Maker is a privacy-first browser tool that automatically generates and burns captions into videos or creates shareable audiogram clips from audio files. It uses the Whisper base.en speech recognition model compiled to WebAssembly, running the full transcription pipeline locally on your device. The result is a downloadable WebM video with subtitles overlaid exactly in sync with the spoken words.

When to use this

  • Adding subtitles to short-form social media clips for TikTok, Instagram Reels, or YouTube Shorts
  • Creating audiogram videos from podcast excerpts or voice recordings
  • Making videos accessible with accurate auto-generated captions before publishing
  • Quickly captioning internal training videos or demos without cloud tools

How it works

The tool decodes your media file using the browser's AudioContext API, downsamples the audio to 16 kHz mono Float32 data, and sends it to a Web Worker running the Whisper base.en ONNX model. The model returns time-stamped text segments, which the component renders onto an HTML Canvas in sync with media playback. On export, the Canvas stream and audio track are captured by the browser's MediaRecorder API and saved as a WebM file — no server involved at any stage.

Example use cases

Social Media Content

Caption a 60-second marketing video with TikTok-style bold yellow-highlight captions in under two minutes.

Podcast Audiograms

Turn an MP3 clip into an animated audiogram with waveform bars and subtitle overlay for sharing on LinkedIn or Twitter.

Accessibility

Add burned-in captions to internal product demos or training recordings to meet accessibility requirements.

Free Video Captioner & Audiogram Maker Online

Interactive Tool

How to use

  1. 1

    Drop your file

    Drag and drop a video (MP4, MOV, WebM) or audio file (MP3, WAV, M4A). The tool auto-detects aspect ratio.

  2. 2

    AI transcription runs locally

    Whisper base.en (~145 MB, downloaded once) transcribes your media entirely in-browser with accurate timestamps per segment.

  3. 3

    Customize captions

    Choose a style preset (TikTok, Classic, Minimal, Karaoke), adjust font, colors, position, and background opacity. Edit any segment by clicking it.

  4. 4

    Export captioned video

    Click Export to record the canvas in real-time and download a captioned WebM file ready to share.

Why use this tool?

  • Runs 100% in your browser — video and audio never leave your device
  • Four caption style presets: TikTok, Classic, Minimal, and Karaoke word-highlight
  • Inline transcript editor lets you fix any AI transcription mistake before exporting
  • Audiogram mode adds animated waveform visualizer (bars, line, or blocks) over a dark gradient background
  • No account, no upload limits, no watermark

Frequently asked questions

What file formats are supported?
Video: MP4, MOV, WebM, MKV, AVI. Audio: MP3, WAV, M4A. The tool auto-detects whether to show video frames or an audiogram background.
Why does the first run download 74 MB?
The Whisper base.en model weights are downloaded once and cached in your browser. Subsequent uses are instant — no re-download.
How long does export take?
Export records in real-time using the browser's MediaRecorder API, so a 30-second video takes roughly 30 seconds to capture. A progress bar tracks the recording.
Can I edit the transcription?
Yes. Click any segment in the transcript panel to edit the text inline before exporting. Changes are reflected immediately on the canvas.
What is the output format?
The exported file is a WebM (VP9 + Opus) video. Most platforms and media players support it; you can convert to MP4 offline if needed.

Related tools

Popular right now