pyhealth.datasets.DREAMTDataset#
The Dataset for Real-time sleep stage EstimAtion using Multisensor wearable Technology (DREAMT) includes wrist-based wearable and polysomnography (PSG) sleep data from 100 participants recruited from the Duke University Health System (DUHS) Sleep Disorder Lab.
This includes wearable signals, PSG signals, sleep labels, and clinical data related to sleep health and disorders.
The DREAMTDataset class provides an interface for loading and working with the DREAMT dataset. It can process DREAMT data across versions into a well-structured dataset object providing support for modeling and analysis.
Refer to the doc for more information about the dataset.
- class pyhealth.datasets.DREAMTDataset(root, dataset_name=None, config_path=None)[source]#
Bases:
BaseDatasetBase Dataset for Real-time sleep stage EstimAtion using Multisensor wearable Technology (DREAMT)
Dataset accepts current versions of DREAMT (1.0.0, 1.0.1, 2.0.0, 2.1.0), available at: https://physionet.org/content/dreamt/
DREAMT includes wrist-based wearable and polysomnography (PSG) sleep data from 100 participants recruited from the Duke University Health System (DUHS) Sleep Disorder Lab. This includes wearable signals, PSG signals, sleep labels, and clinical data related to sleep health and disorders.
When using this dataset, please cite:
Wang, K., Yang, J., Shetty, A., & Dunn, J. (2025). DREAMT: Dataset for Real-time sleep stage EstimAtion using Multisensor wearable Technology (version 2.1.0). PhysioNet. RRID:SCR_007345. https://doi.org/10.13026/7r9r-7r24
Will Ke Wang, Jiamu Yang, Leeor Hershkovich, Hayoung Jeong, Bill Chen, Karnika Singh, Ali R Roghanizad, Md Mobashir Hasan Shandhi, Andrew R Spector, Jessilyn Dunn. (2024). Proceedings of the fifth Conference on Health, Inference, and Learning, PMLR 248:380-396.
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., … & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220. RRID:SCR_007345.
Dataset follows file and folder structure of dataset version, looks for participant_info.csv and data folders, so root path should be version downloaded, example: root = “…/dreamt/1.0.0/” or “…/dreamt/2.0.0/”
- Parameters:
- root#
root directory containing the dataset files
- dataset_name#
name of dataset
- config_path#
path to configuration file
Examples
>>> from pyhealth.datasets import DREAMTDataset >>> dataset = DREAMTDataset(root = "/path/to/dreamt/data/version") >>> dataset.stats() >>> >>> # Get all patient ids >>> unique_patients = dataset.unique_patient_ids >>> print(f"There are {len(unique_patients)} patients") >>> >>> # Get single patient data >>> patient = dataset.get_patient("S002") >>> print(f"Patient has {len(patient.data_source)} event") >>> >>> # Get event >>> event = patient.get_events(event_type="dreamt_sleep") >>> >>> # Get Apnea-Hypopnea Index (AHI) >>> ahi = event[0].ahi >>> print(f"AHI is {ahi}") >>> >>> # Get 64Hz sleep file path >>> file_path = event[0].file_64hz >>> print(f"64Hz sleep file path: {file_path}")
- get_patient_file(patient_id, root, file_path)[source]#
Returns file path of 64Hz and 100Hz data for a patient, or None if no file found
- prepare_metadata(root)[source]#
Prepares metadata csv file for the DREAMT dataset by performing the following: 1. Obtain clinical data from participant_info.csv file 2. Process file paths based on patients found in clinical data 3. Organize all data into a single DataFrame 4. Save the processed DataFrame to a CSV file
- create_tmpdir()#
Creates and returns a new temporary directory within the cache.
- Returns:
The path to the new temporary directory.
- Return type:
- property default_task: Optional[BaseTask]#
Returns the default task for the dataset.
- Returns:
The default task, if any.
- Return type:
Optional[BaseTask]
- get_patient(patient_id)#
Retrieves a Patient object for the given patient ID.
- Parameters:
patient_id (str) – The ID of the patient to retrieve.
- Returns:
The Patient object for the given ID.
- Return type:
- Raises:
AssertionError – If the patient ID is not found in the dataset.
- property global_event_df: LazyFrame#
Returns the path to the cached event dataframe.
- Returns:
The path to the cached event dataframe.
- Return type:
- iter_patients(df=None)#
Yields Patient objects for each unique patient in the dataset.
- load_data()#
Loads data from the specified tables.
- Returns:
A concatenated lazy frame of all tables.
- Return type:
dd.DataFrame
- load_table(table_name)#
Loads a table and processes joins if specified.
- Parameters:
table_name (str) – The name of the table to load.
- Returns:
The processed Dask dataframe for the table.
- Return type:
dd.DataFrame
- Raises:
ValueError – If the table is not found in the config.
FileNotFoundError – If the CSV file for the table or join is not found.
- set_task(task=None, num_workers=None, input_processors=None, output_processors=None)#
Processes the base dataset to generate the task-specific sample dataset. The cache structure is as follows:
{task_name}_{task_uuid}/ # Cached data for specific task based on task name, schema, and args task_df.ld/ # Intermediate task dataframe based on schema samples_{proc_uuid}.ld/ # Final processed samples after applying processors schema.pkl # Saved SampleBuilder schema *.bin # Processed sample files
- Parameters:
task (Optional[BaseTask]) – The task to set. Uses default task if None.
num_workers (int) – Number of workers for multi-threading. Default is self.num_workers.
input_processors (Optional[Dict[str, FeatureProcessor]]) – Pre-fitted input processors. If provided, these will be used instead of creating new ones from task’s input_schema. Defaults to None.
output_processors (Optional[Dict[str, FeatureProcessor]]) – Pre-fitted output processors. If provided, these will be used instead of creating new ones from task’s output_schema. Defaults to None.
- Returns:
The generated sample dataset.
- Return type:
- Raises:
AssertionError – If no default task is found and task is None.