Transcription Cleanup Trainer

Text Cleanup Fine-Tuning Dataset A curated dataset for training speech-to-text cleanup models to achieve optimal transcript refinement. Dataset Description This dataset contains paired examples of raw speech-to-text transcriptions and manually-cleaned versions, designed for fine-tuning models to clean up transcripts to a specific quality level ("Goldilocks" cleanup - not too much, not too little). Dataset Structure dataset/ ├── data/ │ ├── audio/… See the full description on the dataset page: https://huggingface.co/datasets/danielrosehill/Transcription-Cleanup-Trainer.

View on Hugging Face

Project Details

Created

Dec 18, 2025

Platform

Hugging Face Dataset

Type

Dataset

Explore More Projects

← Browse All Projects More AI Experiments Projects →

Transcription Cleanup Trainer

Project Details

Categories

Tags

Explore More Projects