Sample Voice Context Data

Hugging Face Dataset

Sample Voice Context Data A small synthetic dataset containing LLM-generated context information simulating a job seeker narrating their career trajectory. Purpose This dataset was created to test a voice-to-vector-database RAG pipeline. The workflow being evaluated involves: Voice data (MP3 recordings) transcribed to text Transcriptions reformatted as structured context data Text data upserted into a vector database (Pinecone or Ragie) Retrieval accuracy tested by… See the full description on the dataset page: https://huggingface.co/datasets/danielrosehill/Sample-Voice-Context-Data.

Project Information

Categories

Tags

license:mitsize_categories:n<1kformat:textmodality:audiomodality:textlibrary:datasetslibrary:mlcroissantregion:us