pyhealth.datasets.CardiologyDataset#
The Cardiology dataset includes six portions “cpsc_2018”, “cpsc_2018_extra”, “georgia”, “ptb”, “ptb-xl”, “st_petersburg_incart”, refer to doc for more information.
- class pyhealth.datasets.CardiologyDataset(root, chosen_dataset=[1, 1, 1, 1, 1, 1], dataset_name=None, dev=False, refresh_cache=False)[source]#
Bases:
BaseSignalDataset
Base ECG dataset for Cardiology
Dataset is available at https://physionet.org/content/challenge-2020/1.0.2/
- Parameters:
root (
str
) – root directory of the raw data.dev (
bool
) – whether to enable dev mode (only use a small subset of the data). Default is False.refresh_cache (
bool
) – whether to refresh the cache; if true, the dataset will be processed from scratch and the cache will be updated. Default is False.chosen_dataset (
List
[int
]) – a list of (0,1) of length 6 indicting which datasets will be used. Default: [1, 1, 1, 1, 1, 1] The datasets contain “cpsc_2018”, “cpsc_2018_extra”, “georgia”, “ptb”, “ptb-xl”, “st_petersburg_incart”. eg. [0,1,1,1,1,1] indicates that “cpsc_2018_extra”, “georgia”, “ptb”, “ptb-xl” and “st_petersburg_incart” will be used.
- task#
Optional[str], name of the task (e.g., “sleep staging”). Default is None.
- samples#
Optional[List[Dict]], a list of samples, each sample is a dict with patient_id, record_id, and other task-specific attributes as key. Default is None.
- patient_to_index#
Optional[Dict[str, List[int]]], a dict mapping patient_id to a list of sample indices. Default is None.
- visit_to_index#
Optional[Dict[str, List[int]]], a dict mapping visit_id to a list of sample indices. Default is None.
Examples
>>> from pyhealth.datasets import CardiologyDataset >>> dataset = CardiologyDataset( ... root="/srv/local/data/physionet.org/files/challenge-2020/1.0.2/training", ... ) >>> dataset.stat() >>> dataset.info()