Skip to content

PubMedQA

PubMedQA is a biomedical question-answering dataset for evaluating large language models on medical knowledge. The task requires models to classify answers as "yes", "no", or "maybe" based on biomedical questions paired with abstracts from PubMed.

Raw data

Key stats

Modality Task Domain Sample Size Question Format License
Text Classification (3 classes) Biomedical 1,000 manually annotated test samples Medical Q&A with abstracts MIT License

Data organization

PubMedQA is split into three subsets: PQA-A(rtificial), PQA-U(nlabeled) and PQA-L(abeled).

  • PQA-L(abeled): 1,000 manually curated question-abstract-answer triplets with expert annotations (used by eva)
  • PQA-A(rtificial): 55k artificially generated samples (not used in eva)
  • PQA-U(nlabeled): 211k questions without gold standard answers (not used in eva)

Each sample includes: - Question: A biomedical research question - Context: Relevant PubMed abstract(s) - Answer: Expert-annotated classification ("yes", "no", "maybe")

Download and preprocessing

The dataset can be automatically downloaded by setting DOWNLOAD_DATA="true" when running eva. The data will be downloaded to the location specified by DATA_ROOT (default: ./data/pubmedqa).

DOWNLOAD_DATA="true" eva validate --config configs/language/pubmedqa.yaml

License information

Released under the MIT License