STT Studio - Speaker Diarization

Speech Workspace

Convert video to audio, get a transcription, and ask questions in one workspace

The interface is tuned for fast results: minimal extra actions, clear progress, and convenient export and copy for every block.

Browser FFmpeg fallback Diarization + timestamps Q&A on results

How it works

1Upload an audio/video file
2Check parameters and start processing
3Review results, export, and ask questions

Upload

Audio / Video

Drag a file here

Or click to choose from your device

MP3, WAV, FLAC, MP4, MOV, AVI, WEBM

For video: we try to extract audio in the browser (WAV 16k mono), and upload the original file on failure.

No file selected

—

Parameters

API key

user_id (optional)

Speaker diarization

Disable if you only need words and SRT

Speaker diarization profile

WhisperX-like profile: speaker is assigned by maximum interval overlap.

Advanced speaker settings

Leave fields empty if you need auto-estimation. Default num_speakers = 4.

num_speakers

min_speakers

max_speakers

Actions

Download JSON

Progress

Preparing... 0%

Prepare

Upload

Process

Done

Results

Done

Duration —

Processing time —

Language —

Speakers —

Raw transcription

Speaker intervals

Speaker	Start	End

No intervals to display.

Merged speaker speech

Speaker	Start	End	Text

No segments to display.

Speaker SRT

Word timestamps

Word	Start	End

No word timestamps.

Q&A on transcription