Small STT Eval Audio Dataset
Small STT Eval Audio Dataset A small speech-to-text evaluation dataset containing 92 audio samples with ground truth transcriptions. Designed for evaluating STT systems on technical vocabulary, code-switching (English/Hebrew), and various speaking styles. Dataset Description This dataset contains audio recordings with accompanying transcriptions across multiple categories: Category Count Description tech_github 5 GitHub-related technical vocabulary… See the full description on the dataset page: https://huggingface.co/datasets/danielrosehill/Small-STT-Eval-Audio-Dataset.