Why PyHealth?#

PyHealth is the comprehensive Python library for healthcare AI that makes building, testing, and deploying healthcare machine learning models easier than ever before. Whether you’re a researcher, data scientist, or healthcare practitioner, PyHealth provides the tools you need to develop robust healthcare AI applications.

Build healthcare AI pipelines in ~7 lines of code


What Makes PyHealth 2.0 Powerful?#

PyHealth provides comprehensive end-to-end capabilities in a single package:

Unified API for all data types (EHR, images, signals, text, genomics)

Single dependency with no environment conflicts

One consistent workflow across all healthcare data modalities

Scales dynamically from laptop to cluster—adapts to your resources

Research to deployment in the same codebase with the same API

PyHealth 2.0 democratizes healthcare AI—making it accessible, reproducible, and deployable for researchers, data scientists, and healthcare practitioners alike.


Key Features#

Dramatically simpler code#

PyHealth 2.0 reduces the complexity of healthcare AI development from hundreds of lines to single digits:

Code Reduction Across Tasks#

Task

Pandas

PyHealth 1.16

PyHealth 2.0

Patient data exploration

16 lines

14 lines

10 lines

Mortality prediction

51 lines

24 lines

7 lines

Length of stay prediction

22 lines

7 lines

7 lines

Drug recommendation

24 lines

7 lines

7 lines

The same 7 lines of code work for any clinical prediction task. Define your task once, and PyHealth’s optimized backend handles all the complex data processing automatically.

Exceptional performance that scales#

PyHealth 2.0 delivers exceptional performance that makes healthcare AI research accessible on standard hardware:

Breakthrough speed improvements:

  • Up to 39× faster task processing compared to typical pandas-based approaches

  • Dramatically reduced processing time for common clinical prediction tasks

  • Optimized data loaders with smart caching and lazy evaluation

  • Efficient multi-core scaling without memory overflow

Memory efficiency:

  • Dynamically scales to fit consumer-grade hardware (16GB laptops)

  • Handles large-scale datasets like MIMIC-IV without requiring workstation-grade resources

  • Intelligent memory management adapts to available system resources

  • Enables research on complex healthcare datasets without expensive infrastructure

PyHealth 2.0 performance benchmarks showing speed and memory efficiency

Note

What this means for researchers: PyHealth 2.0 enables you to run sophisticated healthcare AI analyses on a standard laptop that previously required high-end workstations. The platform adapts to your available resources while maintaining high performance.

Healthcare-specific design#

PyHealth is built specifically for healthcare, not adapted from general ML libraries:

Medical domain features:

  • Built-in support for medical coding standards: ICD-9/10, CPT, NDC, ATC, RXNorm, CCS

  • Automatic code translation between different ontology systems

  • Native understanding of patient timelines and visit sequences

  • Clinical safety tools: drug-drug interaction checking, model calibration

Healthcare datasets out-of-the-box:

Supported Datasets#

Dataset

Description

Modality

MIMIC-III/IV

Critical care database (300K+ patients)

EHR, Text, Images

eICU

Multi-center ICU database (200K+ stays)

EHR

OMOP-CDM

Standardized healthcare data format

EHR

EHRShot

Few-shot benchmarking (15 tasks)

EHR

COVID19-CXR

COVID-19 chest X-rays

Images

SleepEDF, SHHS, ISRUC

Sleep studies with EEG

Biosignals

ClinVar, COSMIC, TCGA

Genomics and cancer mutations

Genomics

State-of-the-art model library#

Access 33+ pre-built models from recent research papers:

Healthcare-specific models:

  • RETAIN (2016): Interpretable attention for clinical decisions

  • StageNet (2020): Disease progression stage modeling

  • SafeDrug (2021): Safe drug combinations with molecular graphs

  • GAMENet (2019): Graph-augmented memory for medication recommendation

  • AdaCare (2020): Adaptive feature extraction for EHR

  • ConCare (2020): Context-aware patient representation

  • GRASP (2021): Graph neural networks for patient similarity

  • MoleRec (2023): Molecular substructure-aware recommendations

Foundation models:

  • Transformers, RNN/LSTM/GRU, CNN, TCN, MLP

  • Pre-trained vision models (ResNet, ViT) via torchvision

  • Pre-trained language models (BERT, ClinicalBERT) via HuggingFace

Specialized models:

  • ContraWR (2021): Contrastive learning for biosignals (EEG, ECG)

  • SparcNet (2023): Sparse CNNs for seizure detection and sleep staging

  • Deepr (2017): CNNs optimized for medical records

  • Dr. Agent (2020): Multi-agent reinforcement learning for clinical decisions

Production-ready evaluation tools#

Go beyond standard metrics with comprehensive model assessment:

Interpretability methods:

  • Gradient-based: Integrated Gradients, DeepLift, Saliency Maps, GIM

  • Perturbation-based: LIME, SHAP (with healthcare-optimized implementations)

  • Attention-based: Chefer relevance propagation for transformers

  • Visualization tools for clinical decision support

Uncertainty quantification:

  • Probability calibration: Temperature scaling, histogram binning, Dirichlet calibration, KCal

  • Conformal prediction: LABEL, SCRIB, FavMac, with covariate shift support

  • Statistical coverage guarantees for high-stakes clinical decisions

Clinical metrics:

  • Drug-drug interaction (DDI) rates

  • Clinical accuracy metrics

  • Fairness and bias assessment

  • Healthcare-specific performance measures

All integrated in one unified interface for comprehensive model evaluation and validation.

Common Use Cases#

PyHealth excels at these healthcare AI applications:

Clinical prediction tasks#

  • Mortality prediction: ICU and hospital mortality risk assessment

  • Readmission prediction: 30-day and general readmission risk

  • Length of stay: Hospital and ICU duration prediction

  • Disease progression: Track patient condition changes over time

Drug and treatment recommendation#

  • Medication recommendation: Suggest optimal drug combinations

  • Drug safety: Identify dangerous drug-drug interactions

  • Treatment optimization: Personalized therapy selection

  • Dosage prediction: Optimal medication dosing strategies

Medical coding and NLP#

  • Code translation: Convert between ICD-9/10, CPT, NDC, ATC, RXNorm systems

  • Code prediction: Auto-suggest medical codes from clinical notes

  • Specialty classification: Identify medical specialties from transcriptions

  • Clinical information extraction: Extract structured data from text

Biosignal analysis#

  • Sleep staging: Automatic sleep phase classification (Wake, N1, N2, N3, REM)

  • Seizure detection: EEG abnormality identification

  • Cardiac monitoring: ECG analysis and arrhythmia detection

  • Heart sound analysis: Phonocardiogram classification for valve diseases

Medical imaging#

  • Disease classification: Multi-label chest X-ray diagnosis

  • COVID-19 detection: Pneumonia and COVID classification from X-rays

  • Integration with vision models: Easy fine-tuning of pre-trained models

Genomics and precision medicine#

  • Variant pathogenicity: Classify genetic variants (ClinVar)

  • Cancer mutation analysis: Predict mutation impact (COSMIC)

  • Survival prediction: Multi-omics cancer survival models (TCGA)

Flexible and Modular Architecture#

PyHealth’s design philosophy: Use only what you need, customize what you want

Modular components:

  • pyhealth.data: Flexible patient-event data structures (no assumptions on format)

  • pyhealth.datasets: 15+ datasets with lazy loading and smart caching

  • pyhealth.tasks: 40+ pre-defined tasks, easily create custom tasks

  • pyhealth.models: 33+ models, compatible with any PyTorch model

  • pyhealth.processors: Handle sequences, images, signals, text, and tabular data

  • pyhealth.metrics: Clinical performance metrics

  • pyhealth.interpret: Model interpretability methods

  • pyhealth.calib: Uncertainty quantification and calibration

  • pyhealth.medcode: Medical coding standard translation

Easy to extend:

  • Add custom datasets by inheriting BaseDataset (see tutorials/custom_dataset)

  • Define custom tasks with simple input/output schemas (see tutorials/custom_task)

  • Use any PyTorch model with PyHealth’s data loaders

  • Integrate with PyTorch Lightning for distributed training

  • Compatible with HuggingFace, torchvision, and other ecosystems

Reproducible Research Infrastructure#

PyHealth 2.0 addresses the reproducibility crisis in healthcare AI:

Standardized implementations:

  • Eliminate “works on my machine” problems with tested, version-controlled components

  • All models, tasks, and datasets follow peer-reviewed implementations

  • Extensive documentation with 50+ tutorials and examples

Multi-language support:

  • RHealth: Brings PyHealth capabilities to R users and bioinformatics community

  • Expands accessibility beyond Python-centric machine learning

Standards and integration:

  • Compatible with healthcare data standards: OMOP, FHIR

  • Integrates seamlessly with PyTorch, PyTorch Lightning, HuggingFace

  • Works with your existing Python data science stack

Growing Community#

Join our active healthcare AI community:

  • 400+ members in PyHealth Research Initiative pairing researchers with mentors

  • Active Development: Regular updates with new models, datasets, and features

  • Research Collaboration: Direct connection to cutting-edge academic research

  • Industry Partnerships: Collaborations with academic health systems

  • Open Source: Transparent, auditable, and customizable

  • Support: Active Discord community and GitHub discussions - Join our Discord

Get Started Today#

Ready to begin? Explore these key resources:

Jump in with the guides above, or use the navigation on the left for more.