Transcription Cleanup Trainer

Text Cleanup Fine-Tuning Dataset A curated dataset for training speech-to-text cleanup models to achieve optimal transcript refinement. Dataset Description This dataset contains paired examples of raw speech-to-text transcriptions and manually-cleaned versions, designed for fine-tuning models to clean up transcripts to a specific quality level ("Goldilocks" cleanup - not too much, not too little). Dataset Structure dataset/ ├── data/ │ ├── audio/… See the full description on the dataset page: https://huggingface.co/datasets/danielrosehill/Transcription-Cleanup-Trainer.

View on Hugging Face

Project Details

Tags

size_categories:n<1kformat:textmodality:audiomodality:textlibrary:datasetslibrary:mlcroissantregion:us

Explore More Projects