Sample Voice Context Data
Sample Voice Context Data A small synthetic dataset containing LLM-generated context information simulating a job seeker narrating their career trajectory. Purpose This dataset was created to test a voice-to-vector-database RAG pipeline. The workflow being evaluated involves: Voice data (MP3 recordings) transcribed to text Transcriptions reformatted as structured context data Text data upserted into a vector database (Pinecone or Ragie) Retrieval accuracy tested by… See the full description on the dataset page: https://huggingface.co/datasets/danielrosehill/Sample-Voice-Context-Data.