Getting Started with PyHealth#
Welcome to PyHealth! This guide will help you get up and running with healthcare AI development. PyHealth makes it easy to build, test, and deploy healthcare machine learning models with minimal code.
๐ New to PyHealth? Start here. This guide walks you from install to first modeling.
Introduction [Video]#
Prefer video? Watch a short introduction to PyHealth before you start.
Installing PyHealth#
Python Version Requirement
PyHealth 2.0 requires Python 3.12 or higher. Verify your version:
python --version # Should be 3.12.x or 3.13.x
Latest Release (Recommended)
Install PyHealth 2.0 from PyPI with significant performance improvements and new features:
pip install pyhealth
Legacy Version
For backward compatibility, the older stable version (1.16) supports Python 3.9+:
pip install pyhealth==1.16
Development Installation
To install the latest development version from GitHub:
git clone https://github.com/sunlabuiuc/PyHealth.git
cd PyHealth
pip install -e .
See Installation for detailed installation instructions including CUDA setup and platform-specific notes.
Overview of ML Pipelines#
All healthcare tasks in PyHealth follow a five-stage pipeline:
Each stage is modular, allowing customization based on your needs.
Stage 1: Loading Datasets#
pyhealth.datasets provides structured datasets independent of tasks. PyHealth supports MIMIC-III, MIMIC-IV, eICU, and more.
Example:
from pyhealth.datasets import MIMIC3Dataset
mimic3base = MIMIC3Dataset(
root="https://storage.googleapis.com/pyhealth/Synthetic_MIMIC-III/",
tables=["DIAGNOSES_ICD", "PROCEDURES_ICD", "PRESCRIPTIONS"]
)
Stage 2: Defining Tasks#
pyhealth.tasks processes patient data into task-specific samples.
Example:
from pyhealth.tasks.mortality_prediction import MortalityPredictionMIMIC3
mimic3_mortality_prediction = MortalityPredictionMIMIC3()
mimic3sample = mimic3base.set_task(mimic3_mortality_prediction)
To split data and create DataLoaders:
from pyhealth.datasets import split_by_patient, get_dataloader
train_ds, val_ds, test_ds = split_by_patient(mimic3sample, [0.8, 0.1, 0.1])
train_loader = get_dataloader(train_ds, batch_size=32, shuffle=True)
val_loader = get_dataloader(val_ds, batch_size=32, shuffle=False)
test_loader = get_dataloader(test_ds, batch_size=32, shuffle=False)
Stage 3: Building ML Models#
pyhealth.models provides various machine learning models.
Example:
from pyhealth.models import Transformer
model = Transformer(
dataset=mimic3sample,
)
Stage 4: Training the Model#
pyhealth.trainer allows specifying training parameters such as optimizer, epochs, and learning rate.
Example:
from pyhealth.trainer import Trainer
trainer = Trainer(model=model)
trainer.train(
train_dataloader=train_loader,
val_dataloader=val_loader,
epochs=50,
monitor="pr_auc",
)
Stage 5: Evaluating Model Performance#
pyhealth.metrics provides evaluation metrics.
Example:
trainer.evaluate(test_loader)
from pyhealth.metrics.binary import binary_metrics_fn
y_true, y_prob, loss = trainer.inference(test_loader)
binary_metrics_fn(y_true, y_prob, metrics=["pr_auc", "roc_auc"])
Essential Guides#
Now that you understand the basics, dive deeper into PyHealthโs capabilities:
๐ Core Guides#
Why PyHealth? - Discover why PyHealth is the best choice for healthcare AI
MedCode - Learn how to translate between medical coding systems (ICD, NDC, ATC, CCS)
Tutorials - Interactive Jupyter notebooks with real examples
๐ ๏ธ Advanced Topics#
๐ค Community & Support#
How to Contribute - Join our community of healthcare AI developers
Discord Community - Chat with other users and developers