Tasks#
We support various real-world healthcare predictive tasks defined by function calls. The following example tasks are collected from top AI/Medical venues, such as:
Drug Recommendation [Yang et al. IJCAI 2021a, Yang et al. IJCAI 2021b, Shang et al. AAAI 2020]
Readmission Prediction [Choi et al. AAAI 2021]
Mortality Prediction [Choi et al. AAAI 2021]
Length of Stay Prediction
Sleep Staging [Yang et al. ArXix 2021]
Getting Started#
New to PyHealth tasks? Start here:
Tutorial: Introduction to pyhealth.tasks - Learn the basics of defining and using tasks
Code Examples: Browse all examples online at https://github.com/sunlabuiuc/PyHealth/tree/master/examples
Pipeline Examples: Check out our Tutorials page for complete end-to-end examples including:
Mortality Prediction Pipeline
Readmission Prediction Pipeline
Medical Coding Pipeline
Chest X-ray Classification Pipeline
These tutorials demonstrate how to load datasets, apply tasks, train models, and evaluate results.
Understanding Tasks and Processors#
Tasks define what data to extract (via input_schema and output_schema), while processors define how to transform that data into tensors for model training.
After you define a task:
Task execution: The task function extracts relevant features from patient records and generates samples
Processor application: Processors automatically transform these samples into model-ready tensors based on the schemas
Example workflow:
# 1. Define a task with input/output schemas
task = MortalityPredictionMIMIC4()
# input_schema = {"conditions": "sequence", "procedures": "sequence"}
# output_schema = {"mortality": "binary"}
# 2. Apply task to dataset
sample_dataset = base_dataset.set_task(task)
# 3. Processors automatically transform samples:
# - "sequence" -> SequenceProcessor (converts codes to indices)
# - "binary" -> BinaryLabelProcessor (converts labels to tensors)
# 4. Get model-ready tensors
sample = sample_dataset[0]
# sample["conditions"] is now a tensor of token indices
# sample["mortality"] is now a binary tensor [0] or [1]
Learn more about processors:
See the Processors documentation for details on all available processors
Learn about string keys (
"sequence","binary", etc.) that map to specific processorsDiscover how to customize processor behavior with kwargs tuples
Understand processor types for different data modalities (text, images, signals, etc.)
Writing a Custom Task#
When a built-in task doesn’t match your cohort or prediction target, you can
define your own by subclassing BaseTask. The class
needs three things: a name, input and output schemas, and a __call__
method that processes one patient at a time.
from pyhealth.tasks import BaseTask
from pyhealth.data import Patient
from typing import List, Dict, Any
class MyMortalityTask(BaseTask):
task_name: str = "MyMortalityTask"
input_schema: Dict[str, str] = {
"conditions": "sequence", # maps to SequenceProcessor
"procedures": "sequence",
}
output_schema: Dict[str, str] = {
"label": "binary" # maps to BinaryLabelProcessor
}
def __call__(self, patient: Patient) -> List[Dict[str, Any]]:
samples = []
for adm in patient.get_events("admissions"):
label = 1 if adm.hospital_expire_flag == "1" else 0
# Fetch historical diagnoses up to this admission
conditions = patient.get_events("diagnoses_icd", end=adm.timestamp)
cond_codes = [e.icd_code for e in conditions]
if not cond_codes:
continue
samples.append({
"conditions": [cond_codes], # wrapped in a list for the sequence processor
"procedures": [[]],
"label": label,
})
return samples
The __call__ method receives one Patient and returns a list of sample
dictionaries. Each dictionary’s keys should match the schemas you declared.
Returning an empty list is fine — PyHealth simply skips that patient. Note
that event attribute names are always lowercase (e.g. e.icd_code rather
than e.ICD_CODE) because PyHealth lowercases all column names at ingest
time. Timestamps are accessed through event.timestamp rather than the
original column name like charttime, since PyHealth normalises them into
a single property.
Processor String Keys#
The string values in your schemas map to specific processor classes. Here is a quick reference:
String key |
Processor |
Typical use |
|---|---|---|
|
|
Diagnosis codes, procedure codes, drug names |
|
|
Cumulative visit history (drug recommendation, readmission) |
|
|
Aggregated numeric values (e.g. last lab value per item) |
|
|
Irregular time-series measurements |
|
|
Demographics, comorbidity flags |
|
|
Clinical notes |
|
|
Binary classification label (0 / 1) |
|
|
Multi-class label |
|
|
Multi-label classification |
|
|
Continuous regression target |
How set_task() Works#
Calling dataset.set_task(task) iterates over every patient in the
dataset, runs your __call__ method on each one, fits all the processors
on the collected samples, then serialises everything to disk as LitData
.ld files. The result is a SampleDataset that
supports len() and index access, ready for a DataLoader.
sample_ds = dataset.set_task(MyMortalityTask(), num_workers=4)
len(sample_ds) # total ML samples across all patients
sample_ds[0] # a single sample dict with tensor values
If you re-run set_task() with the same task and processor configuration,
PyHealth detects the existing cache and skips reprocessing. During
development it is useful to set dev=True on the dataset, which limits
processing to 1 000 patients so iterations are fast.
Note
A note on multiprocessing. set_task() can spawn worker processes
when num_workers > 1. On macOS and Linux this requires the standard
Python multiprocessing guard around your top-level script:
if __name__ == '__main__':
sample_ds = dataset.set_task(task, num_workers=4)
Without this guard, Python may try to re-import and re-run the script in each worker process, leading to infinite recursion. This is a general Python multiprocessing requirement, not specific to PyHealth.
Available Tasks#
- Base Task
- In-Hospital Mortality (MIMIC-IV)
InHospitalMortalityMIMIC4InHospitalMortalityMIMIC4.task_nameInHospitalMortalityMIMIC4.input_schemaInHospitalMortalityMIMIC4.output_schemaInHospitalMortalityMIMIC4.task_nameInHospitalMortalityMIMIC4.input_schemaInHospitalMortalityMIMIC4.output_schemaInHospitalMortalityMIMIC4.LAB_CATEGORIESInHospitalMortalityMIMIC4.LABITEMSInHospitalMortalityMIMIC4.pre_filter()
- MIMIC-III ICD-9 Coding
- Cardiology Detection
- COVID-19 CXR Classification
- DKA Prediction (MIMIC-IV)
DKAPredictionMIMIC4DKAPredictionMIMIC4.task_nameDKAPredictionMIMIC4.DKA_ICD9_CODESDKAPredictionMIMIC4.DKA_ICD10_PREFIXESDKAPredictionMIMIC4.LAB_CATEGORIESDKAPredictionMIMIC4.LAB_CATEGORY_ORDERDKAPredictionMIMIC4.LABITEMSDKAPredictionMIMIC4.input_schemaDKAPredictionMIMIC4.output_schemaDKAPredictionMIMIC4.pre_filter()
- Drug Recommendation
- Length of Stay Prediction
- Medical Transcriptions Classification
MedicalTranscriptionsClassificationMedicalTranscriptionsClassification.task_nameMedicalTranscriptionsClassification.input_schemaMedicalTranscriptionsClassification.output_schemaMedicalTranscriptionsClassification.task_nameMedicalTranscriptionsClassification.input_schemaMedicalTranscriptionsClassification.output_schemaMedicalTranscriptionsClassification.pre_filter()
- Mortality Prediction (Next Visit)
- Mortality Prediction (StageNet MIMIC-IV)
MortalityPredictionStageNetMIMIC4MortalityPredictionStageNetMIMIC4.task_nameMortalityPredictionStageNetMIMIC4.input_schemaMortalityPredictionStageNetMIMIC4.output_schemaMortalityPredictionStageNetMIMIC4.task_nameMortalityPredictionStageNetMIMIC4.input_schemaMortalityPredictionStageNetMIMIC4.output_schemaMortalityPredictionStageNetMIMIC4.LAB_CATEGORIESMortalityPredictionStageNetMIMIC4.LAB_CATEGORY_NAMESMortalityPredictionStageNetMIMIC4.LABITEMSMortalityPredictionStageNetMIMIC4.pre_filter()
- Patient Linkage (MIMIC-III)
- Readmission Prediction
ReadmissionPredictionMIMIC3ReadmissionPredictionMIMIC4ReadmissionPredictionEICUReadmissionPredictionEICU.task_nameReadmissionPredictionEICU.input_schemaReadmissionPredictionEICU.output_schemaReadmissionPredictionEICU.exclude_minorsReadmissionPredictionEICU.task_nameReadmissionPredictionEICU.input_schemaReadmissionPredictionEICU.output_schemaReadmissionPredictionEICU.pre_filter()
ReadmissionPredictionOMOP
- Sleep Staging
- Sleep Staging (SleepEDF)
- Temple University EEG Tasks
- Sleep Staging v2
- Benchmark EHRShot
- ChestX-ray14 Binary Classification
ChestXray14BinaryClassificationChestXray14BinaryClassification.task_nameChestXray14BinaryClassification.input_schemaChestXray14BinaryClassification.output_schemaChestXray14BinaryClassification.diseaseChestXray14BinaryClassification.task_nameChestXray14BinaryClassification.input_schemaChestXray14BinaryClassification.output_schemaChestXray14BinaryClassification.pre_filter()
- De-Identification NER
- ChestX-ray14 Multilabel Classification
ChestXray14MultilabelClassificationChestXray14MultilabelClassification.task_nameChestXray14MultilabelClassification.input_schemaChestXray14MultilabelClassification.output_schemaChestXray14MultilabelClassification.task_nameChestXray14MultilabelClassification.input_schemaChestXray14MultilabelClassification.output_schemaChestXray14MultilabelClassification.pre_filter()
- Variant Classification (ClinVar)
VariantClassificationClinVarVariantClassificationClinVar.task_nameVariantClassificationClinVar.input_schemaVariantClassificationClinVar.output_schemaVariantClassificationClinVar.CLINICAL_SIGNIFICANCE_CATEGORIESVariantClassificationClinVar.task_nameVariantClassificationClinVar.input_schemaVariantClassificationClinVar.output_schemaVariantClassificationClinVar.CLINICAL_SIGNIFICANCE_CATEGORIESVariantClassificationClinVar.pre_filter()
- Mutation Pathogenicity (COSMIC)
MutationPathogenicityPredictionMutationPathogenicityPrediction.task_nameMutationPathogenicityPrediction.input_schemaMutationPathogenicityPrediction.output_schemaMutationPathogenicityPrediction.VALID_FATHMM_PREDICTIONSMutationPathogenicityPrediction.task_nameMutationPathogenicityPrediction.input_schemaMutationPathogenicityPrediction.output_schemaMutationPathogenicityPrediction.VALID_FATHMM_PREDICTIONSMutationPathogenicityPrediction.pre_filter()
- Cancer Survival Prediction (TCGA)
CancerSurvivalPredictionCancerSurvivalPrediction.task_nameCancerSurvivalPrediction.input_schemaCancerSurvivalPrediction.output_schemaCancerSurvivalPrediction.VITAL_STATUS_DEADCancerSurvivalPrediction.VITAL_STATUS_ALIVECancerSurvivalPrediction.task_nameCancerSurvivalPrediction.input_schemaCancerSurvivalPrediction.output_schemaCancerSurvivalPrediction.VITAL_STATUS_DEADCancerSurvivalPrediction.VITAL_STATUS_ALIVECancerSurvivalPrediction.pre_filter()
- Cancer Mutation Burden (TCGA)
CancerMutationBurdenCancerMutationBurden.task_nameCancerMutationBurden.input_schemaCancerMutationBurden.output_schemaCancerMutationBurden.TMB_THRESHOLDCancerMutationBurden.task_nameCancerMutationBurden.input_schemaCancerMutationBurden.output_schemaCancerMutationBurden.TMB_THRESHOLDCancerMutationBurden.pre_filter()