pyhealth.data package

Submodules

pyhealth.data.base module

class pyhealth.data.base.Standard_Template(patient_id)[source]

Bases: object

Abstract class which can be inherited by various datasets, Key information and memory friendly information will be saved in the data dictionary. Otherwise, save the event and sequence location instead.

abstract parse_admission(pd_df, mapping_dict=None)[source]
parse_icu(pd_df, mapping_dict=None)[source]
abstract parse_patient(pd_series, mapping_dict=None)[source]

pyhealth.data.base_cms module

Base class for CMS dataset

class pyhealth.data.base_cms.CMS_Data(patient_id, procudure_cols, diagnosis_cols)[source]

Bases: Standard_Template

The data template to store CMS data. Customized fields can be added in each parse_xxx methods.

Parameters
patient_idstr

Unique identifier for a patient.

generate_phenotyping(pd_df, diagnosis_mapping_df, diagnosis_codes, diagnosis_dict)[source]
parse_admission(pd_df, mapping_dict=None)[source]
parse_event(pd_df, event_mapping_df, procedure_codes, procedure_dict, save_dir='')[source]
parse_patient(pd_series)[source]

pyhealth.data.base_dataset module

class pyhealth.data.base_dataset.BaseDataset(opt)[source]

Bases: Dataset, ABC

static modify_commandline_options(parser, is_train)[source]

pyhealth.data.base_mimic module

Base class for MIMIC dataset

class pyhealth.data.base_mimic.MIMIC_Data(patient_id, time_duration, selection_method)[source]

Bases: Standard_Template

The data template to store MIMIC data. Customized fields can be added in each parse_xxx methods.

Parameters

patient_id

time_duration

selection_method

generate_episode(pd_df, duration, event_mapping_df, var_list)[source]
generate_episode_headers(var_list)[source]

Generate the header for episode file

Parameters

var_list

parse_admission(pd_df)[source]
parse_event(pd_df, save_dir='', event_mapping_df='', var_list=None)[source]
parse_icu(pd_df, mapping_dict=None)[source]
parse_patient(pd_series, mapping_dict=None)[source]
write_record(temp_list, temp_df, var)[source]
pyhealth.data.base_mimic.parallel_parse_tables(patient_id_list, patient_df, admission_df, icu_df, event_df, event_mapping_df, duration, selection_method, var_list, save_dir)[source]

Parallel methods to process patient information in batches

Parameters
  • patient_id_list

  • patient_df

  • admission_df

  • icu_df

  • event_df

  • var_list

pyhealth.data.expdata_generator module

class pyhealth.data.expdata_generator.ecgdata(expdata_id, root_dir='.')[source]

Bases: object

get_exp_data(sel_task='diagnose', shuffle=True, split_ratio=[0.64, 0.16, 0.2], data_root='', n_limit=-1)[source]

Parameters


taskstr, optional (default=’phenotyping’)

name of current healthcare task

shufflebool, optional (default=True)

determine whether shuffle data or not

split_ratiolist, optional (default=[0.64,0.16,0.2])

used for split whole data into train/valid/test

data_rootstr, optional (default=’’)

if data_root==’’, use data in ./datasets; else use data in data_root

n_limitint, optional (default = -1)

used for sample N-data not for all data, if n_limit==-1, use all data

load_exp_data()[source]
show_data(k=3)[source]

Parameters

class pyhealth.data.expdata_generator.imagedata(expdata_id, root_dir='.')[source]

Bases: object

get_exp_data(sel_task='diagnose', shuffle=True, split_ratio=[0.64, 0.16, 0.2], data_root='', n_limit=-1)[source]

Parameters


taskstr, optional (default=’phenotyping’)

name of current healthcare task

shufflebool, optional (default=True)

determine whether shuffle data or not

split_ratiolist, optional (default=[0.64,0.16,0.2])

used for split whole data into train/valid/test

data_rootstr, (default=’’)

use data in data_root

n_limitint, optional (default = -1)

used for sample N-data not for all data, if n_limit==-1, use all data

load_exp_data()[source]
show_data(k=3)[source]

Parameters

class pyhealth.data.expdata_generator.sequencedata(expdata_id, root_dir='.')[source]

Bases: object

get_exp_data(sel_task='phenotyping', shuffle=True, split_ratio=[0.64, 0.16, 0.2], data_root='', n_limit=-1)[source]

Parameters


taskstr, optional (default=’phenotyping’)

name of current healthcare task

shufflebool, optional (default=True)

determine whether shuffle data or not

split_ratiolist, optional (default=[0.64,0.16,0.2])

used for split whole data into train/valid/test

data_rootstr, optional (default=’’)

if data_root==’’, use data in ./datasets; else use data in data_root

n_limitint, optional (default = -1)

used for sample N-data not for all data, if n_limit==-1, use all data

load_exp_data()[source]
show_data(k=3)[source]

Parameters

class pyhealth.data.expdata_generator.textdata(expdata_id, root_dir='.')[source]

Bases: object

get_exp_data(sel_task='diagnose', shuffle=True, split_ratio=[0.64, 0.16, 0.2], data_root='', n_limit=-1)[source]

Parameters


taskstr, optional (default=’phenotyping’)

name of current healthcare task

shufflebool, optional (default=True)

determine whether shuffle data or not

split_ratiolist, optional (default=[0.64,0.16,0.2])

used for split whole data into train/valid/test

data_rootstr, (default=’’)

use data in data_root

n_limitint, optional (default = -1)

used for sample N-data not for all data, if n_limit==-1, use all data

load_exp_data()[source]
show_data(k=3)[source]

Parameters

pyhealth.data.mimic_clean_methods module

MIMIC dataset handling. Adapted and modified from https://github.com/YerevaNN/mimic3-benchmarks/blob/master/mimic3benchmark/preprocessing.py

pyhealth.data.mimic_clean_methods.clean_crr(df)[source]
pyhealth.data.mimic_clean_methods.clean_dbp(df)[source]
pyhealth.data.mimic_clean_methods.clean_fio2(df)[source]
pyhealth.data.mimic_clean_methods.clean_height(df)[source]
pyhealth.data.mimic_clean_methods.clean_lab(df)[source]
pyhealth.data.mimic_clean_methods.clean_o2sat(df)[source]
pyhealth.data.mimic_clean_methods.clean_sbp(df)[source]
pyhealth.data.mimic_clean_methods.clean_temperature(df)[source]
pyhealth.data.mimic_clean_methods.clean_weight(df)[source]

pyhealth.data.rnn_reader module

class pyhealth.data.rnn_reader.DatasetReader(data)[source]

Bases: BaseDataset

pyhealth.data.rnn_reader.time_series_get(fpath)[source]

Module contents