pyhealth.datasets.CardiologyDataset#

The Cardiology dataset includes six portions “cpsc_2018”, “cpsc_2018_extra”, “georgia”, “ptb”, “ptb-xl”, “st_petersburg_incart”, refer to doc for more information.

class pyhealth.datasets.CardiologyDataset(root, chosen_dataset=[1, 1, 1, 1, 1, 1], dataset_name=None, dev=False, refresh_cache=False)[source]#

Bases: BaseSignalDataset

Base ECG dataset for Cardiology

Dataset is available at https://physionet.org/content/challenge-2020/1.0.2/

Parameters:

dataset_name (Optional[str]) – name of the dataset.
root (str) – root directory of the raw data.
dev (bool) – whether to enable dev mode (only use a small subset of the data). Default is False.
refresh_cache (bool) – whether to refresh the cache; if true, the dataset will be processed from scratch and the cache will be updated. Default is False.
chosen_dataset (List[int]) – a list of (0,1) of length 6 indicting which datasets will be used. Default: [1, 1, 1, 1, 1, 1] The datasets contain “cpsc_2018”, “cpsc_2018_extra”, “georgia”, “ptb”, “ptb-xl”, “st_petersburg_incart”. eg. [0,1,1,1,1,1] indicates that “cpsc_2018_extra”, “georgia”, “ptb”, “ptb-xl” and “st_petersburg_incart” will be used.

task#: Optional[str], name of the task (e.g., “sleep staging”). Default is None.

samples#: Optional[List[Dict]], a list of samples, each sample is a dict with patient_id, record_id, and other task-specific attributes as key. Default is None.

patient_to_index#: Optional[Dict[str, List[int]]], a dict mapping patient_id to a list of sample indices. Default is None.

visit_to_index#: Optional[Dict[str, List[int]]], a dict mapping visit_id to a list of sample indices. Default is None.

Examples

>>> from pyhealth.datasets import CardiologyDataset
>>> dataset = CardiologyDataset(
...         root="/srv/local/data/physionet.org/files/challenge-2020/1.0.2/training",
...     )
>>> dataset.stat()
>>> dataset.info()

process_EEG_data()[source]#