Transcription Cleanup Trainer
Text Cleanup Fine-Tuning Dataset A curated dataset for training speech-to-text cleanup models to achieve optimal transcript refinement. Dataset Description This dataset contains paired examples of raw speech-to-text transcriptions and manually-cleaned versions, designed for fine-tuning models to clean up transcripts to a specific quality level ("Goldilocks" cleanup - not too much, not too little). Dataset Structure dataset/ ├── data/ │ ├── audio/… See the full description on the dataset page: https://huggingface.co/datasets/danielrosehill/Transcription-Cleanup-Trainer.
View on Hugging Face