Datasets
Reference information for the language data Datasets
API.
eva.language.data.datasets.PubMedQA
Bases: TextClassification
Dataset class for PubMedQA question answering task.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root |
str | None
|
Directory to cache the dataset. If None, no local caching is used. |
None
|
split |
Literal['train', 'val', 'test'] | None
|
Valid splits among ["train", "val", "test"]. If None, it will use "train+test+validation". |
None
|
download |
bool
|
Whether to download the dataset if not found locally. Default is False. |
False
|
max_samples |
int | None
|
Maximum number of samples to use. If None, use all samples. |
None
|
Source code in src/eva/language/data/datasets/classification/pubmedqa.py
prepare_data
Downloads and prepares the PubMedQA dataset.
If self._root
is None, the dataset is used directly from HuggingFace.
Otherwise, it checks if the dataset is already cached in self._root
.
If not cached, it downloads the dataset into self._root
.
Source code in src/eva/language/data/datasets/classification/pubmedqa.py
eva.language.data.datasets.LanguageDataset
Bases: MapDataset
, ABC
, Generic[DataSample]
Base dataset class for text tasks.