pyhealth.datasets.TUEVDataset#
Dataset is available at https://isip.piconepress.com/projects/tuh_eeg/html/downloads.shtml
This corpus is a subset of TUEG that contains annotations of EEG segments as one of six classes: (1) spike and sharp wave (SPSW), (2) generalized periodic epileptiform discharges (GPED), (3) periodic lateralized epileptiform discharges (PLED), (4) eye movement (EYEM), (5) artifact (ARTF) and (6) background (BCKG).
- class pyhealth.datasets.TUEVDataset(root, dataset_name=None, dev=False, refresh_cache=False, **kwargs)[source]#
Bases:
BaseSignalDataset
Base EEG dataset for the TUH EEG Events Corpus
Dataset is available at https://isip.piconepress.com/projects/tuh_eeg/html/downloads.shtml
This corpus is a subset of TUEG that contains annotations of EEG segments as one of six classes: (1) spike and sharp wave (SPSW), (2) generalized periodic epileptiform discharges (GPED), (3) periodic lateralized epileptiform discharges (PLED), (4) eye movement (EYEM), (5) artifact (ARTF) and (6) background (BCKG).
- Files are named in the form of bckg_032_a_.edf in the eval partition:
- bckg: this file contains background annotations.
032: a reference to the eval index a_.edf: EEG files are split into a series of files starting with a_.edf, a_1.ef, … These represent pruned EEGs, so the original EEG is split into these segments, and uninteresting parts of the original recording were deleted.
- or in the form of 00002275_00000001.edf in the train partition:
- 00002275: a reference to the train index.
0000001: indicating that this is the first file inssociated with this patient.
- Parameters
root (
str
) – root directory of the raw data. You can choose to use the path to Cassette portion or the Telemetry portion.dev (
bool
) – whether to enable dev mode (only use a small subset of the data). Default is False.refresh_cache (
bool
) – whether to refresh the cache; if true, the dataset will be processed from scratch and the cache will be updated. Default is False.
- task#
Optional[str], name of the task (e.g., “EEG_events”). Default is None.
- samples#
Optional[List[Dict]], a list of samples, each sample is a dict with patient_id, record_id, and other task-specific attributes as key. Default is None.
- patient_to_index#
Optional[Dict[str, List[int]]], a dict mapping patient_id to a list of sample indices. Default is None.
- visit_to_index#
Optional[Dict[str, List[int]]], a dict mapping visit_id to a list of sample indices. Default is None.
Examples
>>> from pyhealth.datasets import TUEVDataset >>> dataset = TUEVDataset( ... root="/srv/local/data/TUH/tuh_eeg_events/v2.0.0/edf/", ... ) >>> dataset.stat() >>> dataset.info()