pyhealth.datasets.TUEVDataset#

Dataset is available at https://isip.piconepress.com/projects/tuh_eeg/html/downloads.shtml

This corpus is a subset of TUEG that contains annotations of EEG segments as one of six classes: (1) spike and sharp wave (SPSW), (2) generalized periodic epileptiform discharges (GPED), (3) periodic lateralized epileptiform discharges (PLED), (4) eye movement (EYEM), (5) artifact (ARTF) and (6) background (BCKG).

class pyhealth.datasets.TUEVDataset(root, dataset_name=None, dev=False, refresh_cache=False, **kwargs)[source]#

Bases: BaseSignalDataset

Base EEG dataset for the TUH EEG Events Corpus

Dataset is available at https://isip.piconepress.com/projects/tuh_eeg/html/downloads.shtml

This corpus is a subset of TUEG that contains annotations of EEG segments as one of six classes: (1) spike and sharp wave (SPSW), (2) generalized periodic epileptiform discharges (GPED), (3) periodic lateralized epileptiform discharges (PLED), (4) eye movement (EYEM), (5) artifact (ARTF) and (6) background (BCKG).

Files are named in the form of bckg_032_a_.edf in the eval partition:
bckg: this file contains background annotations.

032: a reference to the eval index a_.edf: EEG files are split into a series of files starting with a_.edf, a_1.ef, … These represent pruned EEGs, so the original EEG is split into these segments, and uninteresting parts of the original recording were deleted.

or in the form of 00002275_00000001.edf in the train partition:
00002275: a reference to the train index.

0000001: indicating that this is the first file inssociated with this patient.

Parameters:
  • dataset_name (Optional[str]) – name of the dataset.

  • root (str) – root directory of the raw data. You can choose to use the path to Cassette portion or the Telemetry portion.

  • dev (bool) – whether to enable dev mode (only use a small subset of the data). Default is False.

  • refresh_cache (bool) – whether to refresh the cache; if true, the dataset will be processed from scratch and the cache will be updated. Default is False.

task#

Optional[str], name of the task (e.g., “EEG_events”). Default is None.

samples#

Optional[List[Dict]], a list of samples, each sample is a dict with patient_id, record_id, and other task-specific attributes as key. Default is None.

patient_to_index#

Optional[Dict[str, List[int]]], a dict mapping patient_id to a list of sample indices. Default is None.

visit_to_index#

Optional[Dict[str, List[int]]], a dict mapping visit_id to a list of sample indices. Default is None.

Examples

>>> from pyhealth.datasets import TUEVDataset
>>> dataset = TUEVDataset(
...         root="/srv/local/data/TUH/tuh_eeg_events/v2.0.0/edf/",
...     )
>>> dataset.stat()
>>> dataset.info()
process_EEG_data()[source]#