pyhealth.data package¶

Submodules¶

pyhealth.data.base module¶

class pyhealth.data.base.Standard_Template(patient_id)[source]¶

Bases: object

Abstract class which can be inherited by various datasets, Key information and memory friendly information will be saved in the data dictionary. Otherwise, save the event and sequence location instead.

abstract parse_admission(pd_df, mapping_dict=None)[source]¶

parse_icu(pd_df, mapping_dict=None)[source]¶

abstract parse_patient(pd_series, mapping_dict=None)[source]¶

pyhealth.data.base_cms module¶

Base class for CMS dataset

class pyhealth.data.base_cms.CMS_Data(patient_id, procudure_cols, diagnosis_cols)[source]¶

Bases: Standard_Template

The data template to store CMS data. Customized fields can be added in each parse_xxx methods.

Parameters

patient_idstr: Unique identifier for a patient.

generate_phenotyping(pd_df, diagnosis_mapping_df, diagnosis_codes, diagnosis_dict)[source]¶

parse_admission(pd_df, mapping_dict=None)[source]¶

parse_event(pd_df, event_mapping_df, procedure_codes, procedure_dict, save_dir='')[source]¶

parse_patient(pd_series)[source]¶

pyhealth.data.base_dataset module¶

class pyhealth.data.base_dataset.BaseDataset(opt)[source]¶

Bases: Dataset, ABC

static modify_commandline_options(parser, is_train)[source]¶

pyhealth.data.base_mimic module¶

Base class for MIMIC dataset

class pyhealth.data.base_mimic.MIMIC_Data(patient_id, time_duration, selection_method)[source]¶

Bases: Standard_Template

The data template to store MIMIC data. Customized fields can be added in each parse_xxx methods.

Parameters

patient_id

time_duration

selection_method

generate_episode(pd_df, duration, event_mapping_df, var_list)[source]¶

generate_episode_headers(var_list)[source]¶

Generate the header for episode file

Parameters: var_list –

parse_admission(pd_df)[source]¶

parse_event(pd_df, save_dir='', event_mapping_df='', var_list=None)[source]¶

parse_icu(pd_df, mapping_dict=None)[source]¶

parse_patient(pd_series, mapping_dict=None)[source]¶

write_record(temp_list, temp_df, var)[source]¶

pyhealth.data.base_mimic.parallel_parse_tables(patient_id_list, patient_df, admission_df, icu_df, event_df, event_mapping_df, duration, selection_method, var_list, save_dir)[source]¶

Parallel methods to process patient information in batches

Parameters

patient_id_list –
patient_df –
admission_df –
icu_df –
event_df –
var_list –

pyhealth.data.expdata_generator module¶

class pyhealth.data.expdata_generator.ecgdata(expdata_id, root_dir='.')[source]¶

Bases: object

get_exp_data(sel_task='diagnose', shuffle=True, split_ratio=[0.64, 0.16, 0.2], data_root='', n_limit=-1)[source]¶

Parameters

taskstr, optional (default=’phenotyping’): name of current healthcare task
shufflebool, optional (default=True): determine whether shuffle data or not
split_ratiolist, optional (default=[0.64,0.16,0.2]): used for split whole data into train/valid/test
data_rootstr, optional (default=’’): if data_root==’’, use data in ./datasets; else use data in data_root
n_limitint, optional (default = -1): used for sample N-data not for all data, if n_limit==-1, use all data

load_exp_data()[source]¶

show_data(k=3)[source]¶: Parameters

class pyhealth.data.expdata_generator.imagedata(expdata_id, root_dir='.')[source]¶

Bases: object

get_exp_data(sel_task='diagnose', shuffle=True, split_ratio=[0.64, 0.16, 0.2], data_root='', n_limit=-1)[source]¶

Parameters

taskstr, optional (default=’phenotyping’): name of current healthcare task
shufflebool, optional (default=True): determine whether shuffle data or not
split_ratiolist, optional (default=[0.64,0.16,0.2]): used for split whole data into train/valid/test
data_rootstr, (default=’’): use data in data_root
n_limitint, optional (default = -1): used for sample N-data not for all data, if n_limit==-1, use all data

load_exp_data()[source]¶

show_data(k=3)[source]¶: Parameters

class pyhealth.data.expdata_generator.sequencedata(expdata_id, root_dir='.')[source]¶

Bases: object

get_exp_data(sel_task='phenotyping', shuffle=True, split_ratio=[0.64, 0.16, 0.2], data_root='', n_limit=-1)[source]¶

Parameters

taskstr, optional (default=’phenotyping’): name of current healthcare task
shufflebool, optional (default=True): determine whether shuffle data or not
split_ratiolist, optional (default=[0.64,0.16,0.2]): used for split whole data into train/valid/test
data_rootstr, optional (default=’’): if data_root==’’, use data in ./datasets; else use data in data_root
n_limitint, optional (default = -1): used for sample N-data not for all data, if n_limit==-1, use all data

load_exp_data()[source]¶

show_data(k=3)[source]¶: Parameters

class pyhealth.data.expdata_generator.textdata(expdata_id, root_dir='.')[source]¶

Bases: object

get_exp_data(sel_task='diagnose', shuffle=True, split_ratio=[0.64, 0.16, 0.2], data_root='', n_limit=-1)[source]¶

Parameters

taskstr, optional (default=’phenotyping’): name of current healthcare task
shufflebool, optional (default=True): determine whether shuffle data or not
split_ratiolist, optional (default=[0.64,0.16,0.2]): used for split whole data into train/valid/test
data_rootstr, (default=’’): use data in data_root
n_limitint, optional (default = -1): used for sample N-data not for all data, if n_limit==-1, use all data

load_exp_data()[source]¶

show_data(k=3)[source]¶: Parameters

pyhealth.data.mimic_clean_methods module¶

MIMIC dataset handling. Adapted and modified from https://github.com/YerevaNN/mimic3-benchmarks/blob/master/mimic3benchmark/preprocessing.py

pyhealth.data.mimic_clean_methods.clean_crr(df)[source]¶

pyhealth.data.mimic_clean_methods.clean_dbp(df)[source]¶

pyhealth.data.mimic_clean_methods.clean_fio2(df)[source]¶

pyhealth.data.mimic_clean_methods.clean_height(df)[source]¶

pyhealth.data.mimic_clean_methods.clean_lab(df)[source]¶

pyhealth.data.mimic_clean_methods.clean_o2sat(df)[source]¶

pyhealth.data.mimic_clean_methods.clean_sbp(df)[source]¶

pyhealth.data.mimic_clean_methods.clean_temperature(df)[source]¶

pyhealth.data.mimic_clean_methods.clean_weight(df)[source]¶

pyhealth.data.rnn_reader module¶

class pyhealth.data.rnn_reader.DatasetReader(data)[source]¶: Bases: BaseDataset

pyhealth.data.rnn_reader.time_series_get(fpath)[source]¶

pyhealth.data package¶

Submodules¶

pyhealth.data.base module¶

pyhealth.data.base_cms module¶

pyhealth.data.base_dataset module¶

pyhealth.data.base_mimic module¶

pyhealth.data.expdata_generator module¶

pyhealth.data.mimic_clean_methods module¶

pyhealth.data.rnn_reader module¶

Module contents¶