pyhealth.tasks.dka#

class pyhealth.tasks.dka.DKAPredictionMIMIC4(padding=0)[source]#

Bases: BaseTask

Task for predicting Diabetic Ketoacidosis (DKA) in the general patient population.

This task creates PATIENT-LEVEL samples from ALL patients in the dataset, predicting whether they will develop DKA. Features are collected from admissions BEFORE the first DKA event to prevent data leakage.

Target Population:
  • ALL patients in the dataset (no filtering)

  • Large pool of negative samples (patients without DKA)

Label Definition:
  • Positive (1): Patient has any DKA diagnosis code (ICD-9 or ICD-10)

  • Negative (0): Patient has no DKA diagnosis codes

Data Leakage Prevention:
  • Admissions are sorted chronologically

  • For DKA-positive patients: Only data from admissions BEFORE the first DKA admission is included (no data from DKA admission or after)

  • For DKA-negative patients: All admissions are included

  • Patients whose first admission has DKA are excluded (no pre-DKA data)

Features:
  • icd_codes: Combined diagnosis + procedure ICD codes (stagenet format)

  • labs: 10-dimensional vectors with lab categories

Parameters:

padding (int) – Additional padding for StageNet processor. Default: 0.

Example

>>> from pyhealth.datasets import MIMIC4Dataset
>>> from pyhealth.tasks import DKAPredictionMIMIC4
>>>
>>> dataset = MIMIC4Dataset(
...     root="/path/to/mimic4",
...     tables=["diagnoses_icd", "procedures_icd", "labevents", "admissions"],
... )
>>> task = DKAPredictionMIMIC4()
>>> samples = dataset.set_task(task)
task_name: str = 'DKAPredictionMIMIC4'#
DKA_ICD9_CODES: ClassVar[Set[str]] = {'25010', '25011', '25012', '25013'}#
DKA_ICD10_PREFIXES: ClassVar[List[str]] = ['E101', 'E111', 'E131']#
LAB_CATEGORIES: ClassVar[Dict[str, List[str]]] = {'Anion Gap': ['50868', '52500'], 'Bicarbonate': ['50803', '50804'], 'Calcium': ['50808', '51624'], 'Chloride': ['50806', '52434', '50902', '52535'], 'Glucose': ['50809', '52027', '50931', '52569'], 'Magnesium': ['50960'], 'Osmolality': ['52031', '50964', '51701'], 'Phosphate': ['50970'], 'Potassium': ['50822', '52452', '50971', '52610'], 'Sodium': ['50824', '52455', '50983', '52623']}#
LAB_CATEGORY_ORDER: ClassVar[List[str]] = ['Sodium', 'Potassium', 'Chloride', 'Bicarbonate', 'Glucose', 'Calcium', 'Magnesium', 'Anion Gap', 'Osmolality', 'Phosphate']#
LABITEMS: ClassVar[List[str]] = ['50824', '52455', '50983', '52623', '50822', '52452', '50971', '52610', '50806', '52434', '50902', '52535', '50803', '50804', '50809', '52027', '50931', '52569', '50808', '51624', '50960', '50868', '52500', '52031', '50964', '51701', '50970']#
input_schema: Dict[str, Tuple[str, Dict[str, Any]]]#
output_schema: Dict[str, str]#
pre_filter(df)#
Return type:

LazyFrame