pyhealth.datasets.MIMIC3Dataset#
The open Medical Information Mart for Intensive Care (MIMIC-III) database, refer to doc for more information. We process this database into well-structured dataset object and give user the best flexibility and convenience for supporting modeling and analysis.
- class pyhealth.datasets.MIMIC3Dataset(root, tables, dataset_name=None, code_mapping=None, dev=False, refresh_cache=False)[source]#
Bases:
BaseEHRDataset
Base dataset for MIMIC-III dataset.
The MIMIC-III dataset is a large dataset of de-identified health records of ICU patients. The dataset is available at https://mimic.physionet.org/.
- The basic information is stored in the following tables:
PATIENTS: defines a patient in the database, SUBJECT_ID.
ADMISSIONS: defines a patient’s hospital admission, HADM_ID.
- We further support the following tables:
DIAGNOSES_ICD: contains ICD-9 diagnoses (ICD9CM code) for patients.
PROCEDURES_ICD: contains ICD-9 procedures (ICD9PROC code) for patients.
- PRESCRIPTIONS: contains medication related order entries (NDC code)
for patients.
- LABEVENTS: contains laboratory measurements (MIMIC3_ITEMID code)
for patients
- Parameters:
root (
str
) – root directory of the raw data (should contain many csv files).tables (
List
[str
]) – list of tables to be loaded (e.g., [“DIAGNOSES_ICD”, “PROCEDURES_ICD”]).code_mapping (
Optional
[Dict
[str
,Union
[str
,Tuple
[str
,Dict
]]]]) –a dictionary containing the code mapping information. The key is a str of the source code vocabulary and the value is of two formats:
a str of the target code vocabulary;
a tuple with two elements. The first element is a str of the target code vocabulary and the second element is a dict with keys “source_kwargs” or “target_kwargs” and values of the corresponding kwargs for the CrossMap.map() method.
Default is empty dict, which means the original code will be used.
dev (
bool
) – whether to enable dev mode (only use a small subset of the data). Default is False.refresh_cache (
bool
) – whether to refresh the cache; if true, the dataset will be processed from scratch and the cache will be updated. Default is False.
- task#
Optional[str], name of the task (e.g., “mortality prediction”). Default is None.
- samples#
Optional[List[Dict]], a list of samples, each sample is a dict with patient_id, visit_id, and other task-specific attributes as key. Default is None.
- patient_to_index#
Optional[Dict[str, List[int]]], a dict mapping patient_id to a list of sample indices. Default is None.
- visit_to_index#
Optional[Dict[str, List[int]]], a dict mapping visit_id to a list of sample indices. Default is None.
Examples
>>> from pyhealth.datasets import MIMIC3Dataset >>> dataset = MIMIC3Dataset( ... root="/srv/local/data/physionet.org/files/mimiciii/1.4", ... tables=["DIAGNOSES_ICD", "PRESCRIPTIONS"], ... code_mapping={"NDC": ("ATC", {"target_kwargs": {"level": 3}})}, ... ) >>> dataset.stat() >>> dataset.info()
- parse_basic_info(patients)[source]#
Helper function which parses PATIENTS and ADMISSIONS tables.
Will be called in self.parse_tables()
- Docs:
- parse_diagnoses_icd(patients)[source]#
Helper function which parses DIAGNOSES_ICD table.
Will be called in self.parse_tables()
- Docs:
DIAGNOSES_ICD: https://mimic.mit.edu/docs/iii/tables/diagnoses_icd/
- Parameters:
patients (
Dict
[str
,Patient
]) – a dict of Patient objects indexed by patient_id.- Return type:
- Returns:
The updated patients dict.
Note
- MIMIC-III does not provide specific timestamps in DIAGNOSES_ICD
table, so we set it to None.
- parse_procedures_icd(patients)[source]#
Helper function which parses PROCEDURES_ICD table.
Will be called in self.parse_tables()
- Docs:
PROCEDURES_ICD: https://mimic.mit.edu/docs/iii/tables/procedures_icd/
- Parameters:
patients (
Dict
[str
,Patient
]) – a dict of Patient objects indexed by patient_id.- Return type:
- Returns:
The updated patients dict.
Note
- MIMIC-III does not provide specific timestamps in PROCEDURES_ICD
table, so we set it to None.
- parse_prescriptions(patients)[source]#
Helper function which parses PRESCRIPTIONS table.
Will be called in self.parse_tables()
- Docs:
PRESCRIPTIONS: https://mimic.mit.edu/docs/iii/tables/prescriptions/