Whiteboards

Whiteboards A small, single-author whiteboard corpus for evaluating vision-language models on handwritten OCR accuracy and for studying pseudotext hallucination — the failure mode where a VLM invents plausible-but-wrong words for ambiguous handwriting. Every image is the same wall-mounted whiteboard, same marker, same author, photographed with a phone. This is deliberate: the dataset exists to measure whether a small number of human-authored ground-truth pairs can improve… See the full description on the dataset page: https://huggingface.co/datasets/danielrosehill/Whiteboards.

View on Hugging Face

Project Details

Tags

task_categories:image-to-texttask_categories:image-to-imagelanguage:enlicense:cc-by-4.0size_categories:n<1kformat:jsonmodality:imagemodality:textlibrary:datasetslibrary:pandaslibrary:polarslibrary:mlcroissantregion:uswhiteboardhandwritingocrfew-shotvlmevaluationpseudotext

Explore More Projects