pyhealth.datasets.MIMIC4Dataset#
The open Medical Information Mart for Intensive Care (MIMIC-IV) database, refer to doc for more information. We process this database into well-structured dataset object and give user the best flexibility and convenience for supporting modeling and analysis.
- class pyhealth.datasets.MIMIC4Dataset(root, tables, dataset_name=None, code_mapping=None, dev=False, refresh_cache=False)[source]#
Bases:
BaseEHRDataset
Base dataset for MIMIC-IV dataset.
The MIMIC-IV dataset is a large dataset of de-identified health records of ICU patients. The dataset is available at https://mimic.physionet.org/.
- The basic information is stored in the following tables:
patients: defines a patient in the database, subject_id.
admission: define a patient’s hospital admission, hadm_id.
- We further support the following tables:
- diagnoses_icd: contains ICD diagnoses (ICD9CM and ICD10CM code)
for patients.
- procedures_icd: contains ICD procedures (ICD9PROC and ICD10PROC
code) for patients.
- prescriptions: contains medication related order entries (NDC code)
for patients.
- labevents: contains laboratory measurements (MIMIC4_ITEMID code)
for patients
- Parameters:
root (
str
) – root directory of the raw data (should contain many csv files).tables (
List
[str
]) – list of tables to be loaded (e.g., [“DIAGNOSES_ICD”, “PROCEDURES_ICD”]).code_mapping (
Optional
[Dict
[str
,Union
[str
,Tuple
[str
,Dict
]]]]) –a dictionary containing the code mapping information. The key is a str of the source code vocabulary and the value is of two formats:
a str of the target code vocabulary;
a tuple with two elements. The first element is a str of the target code vocabulary and the second element is a dict with keys “source_kwargs” or “target_kwargs” and values of the corresponding kwargs for the CrossMap.map() method.
Default is empty dict, which means the original code will be used.
dev (
bool
) – whether to enable dev mode (only use a small subset of the data). Default is False.refresh_cache (
bool
) – whether to refresh the cache; if true, the dataset will be processed from scratch and the cache will be updated. Default is False.
- task#
Optional[str], name of the task (e.g., “mortality prediction”). Default is None.
- samples#
Optional[List[Dict]], a list of samples, each sample is a dict with patient_id, visit_id, and other task-specific attributes as key. Default is None.
- patient_to_index#
Optional[Dict[str, List[int]]], a dict mapping patient_id to a list of sample indices. Default is None.
- visit_to_index#
Optional[Dict[str, List[int]]], a dict mapping visit_id to a list of sample indices. Default is None.
Examples
>>> from pyhealth.datasets import MIMIC4Dataset >>> dataset = MIMIC4Dataset( ... root="/srv/local/data/physionet.org/files/mimiciv/2.0/hosp", ... tables=["diagnoses_icd", "procedures_icd", "prescriptions", "labevents"], ... code_mapping={"NDC": ("ATC", {"target_kwargs": {"level": 3}})}, ... ) >>> dataset.stat() >>> dataset.info()
- parse_basic_info(patients)[source]#
Helper functions which parses patients and admissions tables.
Will be called in self.parse_tables()
- Docs:
patients:https://mimic.mit.edu/docs/iv/modules/hosp/patients/
admissions: https://mimic.mit.edu/docs/iv/modules/hosp/admissions/
- parse_diagnoses_icd(patients)[source]#
Helper function which parses diagnosis_icd table.
Will be called in self.parse_tables()
- Docs:
diagnosis_icd: https://mimic.mit.edu/docs/iv/modules/hosp/diagnoses_icd/
- Parameters:
patients (
Dict
[str
,Patient
]) – a dict of Patient objects indexed by patient_id.- Return type:
- Returns:
The updated patients dict.
Note
- MIMIC-IV does not provide specific timestamps in diagnoses_icd
table, so we set it to None.
- parse_procedures_icd(patients)[source]#
Helper function which parses procedures_icd table.
Will be called in self.parse_tables()
- Docs:
procedures_icd: https://mimic.mit.edu/docs/iv/modules/hosp/procedures_icd/
- Parameters:
patients (
Dict
[str
,Patient
]) – a dict of Patient objects indexed by patient_id.- Return type:
- Returns:
The updated patients dict.
Note
- MIMIC-IV does not provide specific timestamps in procedures_icd
table, so we set it to None.
- parse_prescriptions(patients)[source]#
Helper function which parses prescriptions table.
Will be called in self.parse_tables()
- Docs:
prescriptions: https://mimic.mit.edu/docs/iv/modules/hosp/prescriptions/
- parse_labevents(patients)[source]#
Helper function which parses labevents table.
Will be called in self.parse_tables()
- Docs:
- parse_hcpcsevents(patients)[source]#
Helper function which parses hcpcsevents table.
Will be called in self.parse_tables()
- Docs:
- Parameters:
patients (
Dict
[str
,Patient
]) – a dict of Patient objects indexed by patient_id.- Return type:
- Returns:
The updated patients dict.
Note
- MIMIC-IV does not provide specific timestamps in hcpcsevents
table, so we set it to None.