pyhealth.datasets.SHHSDataset#

The open Sleep-EDF Database Expanded database, refer to doc for more information.

class pyhealth.datasets.SHHSDataset(root, dataset_name=None, dev=False, refresh_cache=False, **kwargs)[source]#

Bases: BaseSignalDataset

Base EEG dataset for Sleep Heart Health Study (SHHS)

Dataset is available at https://sleepdata.org/datasets/shhs

The Sleep Heart Health Study (SHHS) is a multi-center cohort study implemented by the National Heart Lung & Blood Institute to determine the cardiovascular and other consequences of sleep-disordered breathing. It tests whether sleep-related breathing is associated with an increased risk of coronary heart disease, stroke, all cause mortality, and hypertension. In all, 6,441 men and women aged 40 years and older were enrolled between November 1, 1995 and January 31, 1998 to take part in SHHS Visit 1. During exam cycle 3 (January 2001- June 2003), a second polysomnogram (SHHS Visit 2) was obtained in 3,295 of the participants. CVD Outcomes data were monitored and adjudicated by parent cohorts between baseline and 2011. More than 130 manuscripts have been published investigating predictors and outcomes of sleep disorders.

Parameters:
  • dataset_name (Optional[str]) – name of the dataset.

  • root (str) – root directory of the raw data (should contain many csv files).

  • dev (bool) – whether to enable dev mode (only use a small subset of the data). Default is False.

  • refresh_cache (bool) – whether to refresh the cache; if true, the dataset will be processed from scratch and the cache will be updated. Default is False.

task#

Optional[str], name of the task (e.g., “sleep staging”). Default is None.

samples#

Optional[List[Dict]], a list of samples, each sample is a dict with patient_id, record_id, and other task-specific attributes as key. Default is None.

patient_to_index#

Optional[Dict[str, List[int]]], a dict mapping patient_id to a list of sample indices. Default is None.

visit_to_index#

Optional[Dict[str, List[int]]], a dict mapping visit_id to a list of sample indices. Default is None.

Examples

>>> from pyhealth.datasets import SHHSDataset
>>> dataset = SHHSDataset(
...         root="/srv/local/data/SHHS/",
...     )
>>> dataset.stat()
>>> dataset.info()
parse_patient_id(file_name)[source]#
Parameters:

file_name – the file name of the shhs datasets. e.g., shhs1-200001.edf

Returns:

the patient id of the shhs datasets. e.g., 200001

Return type:

patient_id

process_EEG_data()[source]#