Small STT Eval Audio Dataset

Hugging Face Dataset

Small STT Eval Audio Dataset A small speech-to-text evaluation dataset containing 92 audio samples with ground truth transcriptions. Designed for evaluating STT systems on technical vocabulary, code-switching (English/Hebrew), and various speaking styles. Dataset Description This dataset contains audio recordings with accompanying transcriptions across multiple categories: Category Count Description tech_github 5 GitHub-related technical vocabulary… See the full description on the dataset page: https://huggingface.co/datasets/danielrosehill/Small-STT-Eval-Audio-Dataset.

Project Information

Categories

Tags

task_categories:automatic-speech-recognitionlanguage:enlanguage:helicense:cc-by-4.0size_categories:n<1kformat:audiofoldermodality:audiomodality:textlibrary:datasetslibrary:mlcroissantregion:usspeech-to-textsttevaluationtechnical-vocabulary