Galaxy |

Repository whisperx

Name: whisperx

Owner: bgruening

Synopsis: Transcribe audio or video files to text using the OpenAI Whisper and speaker diarization using pyannote audio

Detailed description:

WhisperX provides fast automatic speech recognition (ASR) with word-level timestamps and speaker diarization.
It enables batched inference for up to 70x realtime transcription using the Whisper large-v2 model, with support for faster-whisper backend requiring less than 8GB GPU memory.
Accurate word-level timestamps are achieved using wav2vec2 alignment, and multispeaker ASR is supported via speaker diarization from pyannote audio.
The tool includes VAD preprocessing to reduce hallucination and improve batching without degrading word error rate (WER).
Whisper is an ASR model developed by OpenAI, trained on a large and diverse audio dataset.
While Whisper produces highly accurate transcriptions, its native timestamps are at the utterance level and may be inaccurate by several seconds.
WhisperX addresses these limitations by providing word-level alignment and batching support.
Additional features include phoneme-based ASR using models like wav2vec2.0, forced alignment for phone-level segmentation, voice activity detection (VAD), and speaker diarization to segment audio by speaker identity.

Content homepage: https://github.com/bgruening/galaxytools/tree/master/tools/whisperx

Development repository: https://github.com/bgruening/galaxytools/tree/master/tools/whisperx

Link to this repository: https://toolshed.g2.bx.psu.edu/view/bgruening/whisperx/01b0fcc50e74

Clone this repository: hg clone https://toolshed.g2.bx.psu.edu/repos/bgruening/whisperx

Type: unrestricted

Revision: 2:01b0fcc50e74

This revision can be installed: True

Times cloned / installed: 19

Contents of this repository

Valid tools - click the name to preview the tool and use the pop-up menu to inspect all metadata
Name	Description	Version	Minimum Galaxy Version
Speech to Text with Diarization View tool metadata	Transcribe audio or video files to text using the OpenAI Whisper and speaker diarization (WhisperX)	3.4.2+galaxy1	25.0

Valid tools - click the name to preview the tool and use the pop-up menu to inspect all metadata

Name

Description

Version

Minimum Galaxy Version

View tool metadata

Transcribe audio or video files to text using the OpenAI Whisper and speaker diarization (WhisperX)

3.4.2+galaxy1

25.0