PubMedQA
PubMedQA is a biomedical question-answering dataset for evaluating large language models on medical knowledge. The task requires models to classify answers as "yes", "no", or "maybe" based on biomedical questions paired with abstracts from PubMed.
Raw data
Key stats
Modality | Task | Domain | Sample Size | Question Format | License |
---|---|---|---|---|---|
Text | Classification (3 classes) | Biomedical | 1,000 manually annotated test samples | Medical Q&A with abstracts | MIT License |
Data organization
PubMedQA is split into three subsets: PQA-A(rtificial), PQA-U(nlabeled) and PQA-L(abeled).
- PQA-L(abeled): 1,000 manually curated question-abstract-answer triplets with expert annotations (used by eva)
- PQA-A(rtificial): 55k artificially generated samples (not used in eva)
- PQA-U(nlabeled): 211k questions without gold standard answers (not used in eva)
Each sample includes: - Question: A biomedical research question - Context: Relevant PubMed abstract(s) - Answer: Expert-annotated classification ("yes", "no", "maybe")
Download and preprocessing
The dataset can be automatically downloaded by setting DOWNLOAD_DATA="true"
when running eva. The data will be downloaded to the location specified by DATA_ROOT
(default: ./data/pubmedqa
).
Relevant links
- Paper: PubMedQA: A Dataset for Biomedical Research Question Answering
- Official Repository
- Leaderboard
- HuggingFace
License information
Released under the MIT License