pyhealth.data package¶
Submodules¶
pyhealth.data.base module¶
pyhealth.data.base_cms module¶
Base class for CMS dataset
- class pyhealth.data.base_cms.CMS_Data(patient_id, procudure_cols, diagnosis_cols)[source]¶
Bases:
Standard_TemplateThe data template to store CMS data. Customized fields can be added in each parse_xxx methods.
- Parameters
- patient_idstr
Unique identifier for a patient.
pyhealth.data.base_dataset module¶
pyhealth.data.base_mimic module¶
Base class for MIMIC dataset
- class pyhealth.data.base_mimic.MIMIC_Data(patient_id, time_duration, selection_method)[source]¶
Bases:
Standard_TemplateThe data template to store MIMIC data. Customized fields can be added in each parse_xxx methods.
- Parameters
patient_id
time_duration
selection_method
- pyhealth.data.base_mimic.parallel_parse_tables(patient_id_list, patient_df, admission_df, icu_df, event_df, event_mapping_df, duration, selection_method, var_list, save_dir)[source]¶
Parallel methods to process patient information in batches
- Parameters
patient_id_list –
patient_df –
admission_df –
icu_df –
event_df –
var_list –
pyhealth.data.expdata_generator module¶
- class pyhealth.data.expdata_generator.ecgdata(expdata_id, root_dir='.')[source]¶
Bases:
object- get_exp_data(sel_task='diagnose', shuffle=True, split_ratio=[0.64, 0.16, 0.2], data_root='', n_limit=-1)[source]¶
Parameters
- taskstr, optional (default=’phenotyping’)
name of current healthcare task
- shufflebool, optional (default=True)
determine whether shuffle data or not
- split_ratiolist, optional (default=[0.64,0.16,0.2])
used for split whole data into train/valid/test
- data_rootstr, optional (default=’’)
if data_root==’’, use data in ./datasets; else use data in data_root
- n_limitint, optional (default = -1)
used for sample N-data not for all data, if n_limit==-1, use all data
- class pyhealth.data.expdata_generator.imagedata(expdata_id, root_dir='.')[source]¶
Bases:
object- get_exp_data(sel_task='diagnose', shuffle=True, split_ratio=[0.64, 0.16, 0.2], data_root='', n_limit=-1)[source]¶
Parameters
- taskstr, optional (default=’phenotyping’)
name of current healthcare task
- shufflebool, optional (default=True)
determine whether shuffle data or not
- split_ratiolist, optional (default=[0.64,0.16,0.2])
used for split whole data into train/valid/test
- data_rootstr, (default=’’)
use data in data_root
- n_limitint, optional (default = -1)
used for sample N-data not for all data, if n_limit==-1, use all data
- class pyhealth.data.expdata_generator.sequencedata(expdata_id, root_dir='.')[source]¶
Bases:
object- get_exp_data(sel_task='phenotyping', shuffle=True, split_ratio=[0.64, 0.16, 0.2], data_root='', n_limit=-1)[source]¶
Parameters
- taskstr, optional (default=’phenotyping’)
name of current healthcare task
- shufflebool, optional (default=True)
determine whether shuffle data or not
- split_ratiolist, optional (default=[0.64,0.16,0.2])
used for split whole data into train/valid/test
- data_rootstr, optional (default=’’)
if data_root==’’, use data in ./datasets; else use data in data_root
- n_limitint, optional (default = -1)
used for sample N-data not for all data, if n_limit==-1, use all data
- class pyhealth.data.expdata_generator.textdata(expdata_id, root_dir='.')[source]¶
Bases:
object- get_exp_data(sel_task='diagnose', shuffle=True, split_ratio=[0.64, 0.16, 0.2], data_root='', n_limit=-1)[source]¶
Parameters
- taskstr, optional (default=’phenotyping’)
name of current healthcare task
- shufflebool, optional (default=True)
determine whether shuffle data or not
- split_ratiolist, optional (default=[0.64,0.16,0.2])
used for split whole data into train/valid/test
- data_rootstr, (default=’’)
use data in data_root
- n_limitint, optional (default = -1)
used for sample N-data not for all data, if n_limit==-1, use all data
pyhealth.data.mimic_clean_methods module¶
MIMIC dataset handling. Adapted and modified from https://github.com/YerevaNN/mimic3-benchmarks/blob/master/mimic3benchmark/preprocessing.py
pyhealth.data.rnn_reader module¶
- class pyhealth.data.rnn_reader.DatasetReader(data)[source]¶
Bases:
BaseDataset