Getting Started with PyHealth#

Welcome to PyHealth! This guide will help you get up and running with healthcare AI development. PyHealth makes it easy to build, test, and deploy healthcare machine learning models with minimal code.

🚀 New to PyHealth? Start here. This guide walks you from install to first modeling.

Introduction [Video]#

Prefer video? Watch a short introduction to PyHealth before you start.

Installing PyHealth#

Python Version Requirement

PyHealth 2.0 requires Python 3.12 or higher. Verify your version:

python --version  # Should be 3.12.x or 3.13.x

Latest Release (Recommended)

Install PyHealth 2.0 from PyPI with significant performance improvements and new features:

pip install pyhealth

Legacy Version

For backward compatibility, the older stable version (1.16) supports Python 3.9+:

pip install pyhealth==1.16

Development Installation

To install the latest development version from GitHub:

git clone https://github.com/sunlabuiuc/PyHealth.git
cd PyHealth
pip install -e .

See Installation for detailed installation instructions including CUDA setup and platform-specific notes.

Overview of ML Pipelines#

All healthcare tasks in PyHealth follow a five-stage pipeline:

Each stage is modular, allowing customization based on your needs.

Stage 1: Loading Datasets#

pyhealth.datasets provides structured datasets independent of tasks. PyHealth supports MIMIC-III, MIMIC-IV, eICU, and more.

Example:

from pyhealth.datasets import MIMIC3Dataset

mimic3base = MIMIC3Dataset(
    root="https://storage.googleapis.com/pyhealth/Synthetic_MIMIC-III/",
    tables=["DIAGNOSES_ICD", "PROCEDURES_ICD", "PRESCRIPTIONS"]
)

Stage 2: Defining Tasks#

pyhealth.tasks processes patient data into task-specific samples.

Example:

from pyhealth.tasks.mortality_prediction import MortalityPredictionMIMIC3

mimic3_mortality_prediction = MortalityPredictionMIMIC3()
mimic3sample = mimic3base.set_task(mimic3_mortality_prediction)

To split data and create DataLoaders:

from pyhealth.datasets import split_by_patient, get_dataloader

train_ds, val_ds, test_ds = split_by_patient(mimic3sample, [0.8, 0.1, 0.1])
train_loader = get_dataloader(train_ds, batch_size=32, shuffle=True)
val_loader = get_dataloader(val_ds, batch_size=32, shuffle=False)
test_loader = get_dataloader(test_ds, batch_size=32, shuffle=False)

Stage 3: Building ML Models#

pyhealth.models provides various machine learning models.

Example:

from pyhealth.models import Transformer

model = Transformer(
    dataset=mimic3sample,
)

Stage 4: Training the Model#

pyhealth.trainer allows specifying training parameters such as optimizer, epochs, and learning rate.

Example:

from pyhealth.trainer import Trainer

trainer = Trainer(model=model)
trainer.train(
    train_dataloader=train_loader,
    val_dataloader=val_loader,
    epochs=50,
    monitor="pr_auc",
)

Stage 5: Evaluating Model Performance#

pyhealth.metrics provides evaluation metrics.

Example:

trainer.evaluate(test_loader)

from pyhealth.metrics.binary import binary_metrics_fn

y_true, y_prob, loss = trainer.inference(test_loader)
binary_metrics_fn(y_true, y_prob, metrics=["pr_auc", "roc_auc"])

Essential Guides#

Now that you understand the basics, dive deeper into PyHealth’s capabilities:

📚 Core Guides#

Why PyHealth? - Discover why PyHealth is the best choice for healthcare AI
MedCode - Learn how to translate between medical coding systems (ICD, NDC, ATC, CCS)
Tutorials - Interactive Jupyter notebooks with real examples

🛠️ Advanced Topics#

Models - Complete documentation of all available models
Datasets - Working with healthcare datasets
Tasks - Defining custom healthcare prediction tasks

🤝 Community & Support#

How to Contribute - Join our community of healthcare AI developers
Discord Community - Chat with other users and developers