The Cardiology dataset includes six portions “cpsc_2018”, “cpsc_2018_extra”, “georgia”, “ptb”, “ptb-xl”, “st_petersburg_incart”, refer to doc for more information.

class pyhealth.datasets.CardiologyDataset(root, chosen_dataset=[1, 1, 1, 1, 1, 1], dataset_name=None, dev=False, refresh_cache=False)[source]#

Bases: BaseSignalDataset

Base ECG dataset for Cardiology

Dataset is available at

  • dataset_name (Optional[str]) – name of the dataset.

  • root (str) – root directory of the raw data.

  • dev (bool) – whether to enable dev mode (only use a small subset of the data). Default is False.

  • refresh_cache (bool) – whether to refresh the cache; if true, the dataset will be processed from scratch and the cache will be updated. Default is False.

  • chosen_dataset (List[int]) – a list of (0,1) of length 6 indicting which datasets will be used. Default: [1, 1, 1, 1, 1, 1] The datasets contain “cpsc_2018”, “cpsc_2018_extra”, “georgia”, “ptb”, “ptb-xl”, “st_petersburg_incart”. eg. [0,1,1,1,1,1] indicates that “cpsc_2018_extra”, “georgia”, “ptb”, “ptb-xl” and “st_petersburg_incart” will be used.


Optional[str], name of the task (e.g., “sleep staging”). Default is None.


Optional[List[Dict]], a list of samples, each sample is a dict with patient_id, record_id, and other task-specific attributes as key. Default is None.


Optional[Dict[str, List[int]]], a dict mapping patient_id to a list of sample indices. Default is None.


Optional[Dict[str, List[int]]], a dict mapping visit_id to a list of sample indices. Default is None.


>>> from pyhealth.datasets import CardiologyDataset
>>> dataset = CardiologyDataset(
...         root="/srv/local/data/",
...     )
>>> dataset.stat()