Why PyHealth?#

PyHealth is the comprehensive Python library for healthcare AI that makes building, testing, and deploying healthcare machine learning models easier than ever before. Whether you’re a researcher, data scientist, or healthcare practitioner, PyHealth provides the tools you need to develop robust healthcare AI applications.

Build healthcare AI pipelines in ~7 lines of code

Note

📄 Read the PyHealth 2.0 paper: PyHealth 2.0: A Comprehensive Open-Source Toolkit for Accessible and Reproducible Clinical Deep Learning

What Makes PyHealth 2.0 Powerful?#

PyHealth provides comprehensive end-to-end capabilities in a single package:

✅ Unified API for all data types (EHR, images, signals, text, genomics)

✅ Single dependency with no environment conflicts

✅ One consistent workflow across all healthcare data modalities

✅ Scales dynamically from laptop to cluster—adapts to your resources

✅ Research to deployment in the same codebase with the same API

PyHealth 2.0 democratizes healthcare AI—making it accessible, reproducible, and deployable for researchers, data scientists, and healthcare practitioners alike.

Key Features#

Dramatically simpler code#

PyHealth 2.0 reduces the complexity of healthcare AI development from hundreds of lines to single digits:

Code Reduction Across Tasks#
Task	Pandas	PyHealth 1.16	PyHealth 2.0
Patient data exploration	16 lines	14 lines	10 lines
Mortality prediction	51 lines	24 lines	7 lines
Length of stay prediction	22 lines	7 lines	7 lines
Drug recommendation	24 lines	7 lines	7 lines

The same 7 lines of code work for any clinical prediction task. Define your task once, and PyHealth’s optimized backend handles all the complex data processing automatically.

Exceptional performance that scales#

PyHealth 2.0 delivers exceptional performance that makes healthcare AI research accessible on standard hardware:

Breakthrough speed improvements:

Up to 39× faster task processing compared to typical pandas-based approaches
Dramatically reduced processing time for common clinical prediction tasks
Optimized data loaders with smart caching and lazy evaluation
Efficient multi-core scaling without memory overflow

Memory efficiency:

Dynamically scales to fit consumer-grade hardware (16GB laptops)
Handles large-scale datasets like MIMIC-IV without requiring workstation-grade resources
Intelligent memory management adapts to available system resources
Enables research on complex healthcare datasets without expensive infrastructure

Note

What this means for researchers: PyHealth 2.0 enables you to run sophisticated healthcare AI analyses on a standard laptop that previously required high-end workstations. The platform adapts to your available resources while maintaining high performance.

Healthcare-specific design#

PyHealth is built specifically for healthcare, not adapted from general ML libraries:

Medical domain features:

Built-in support for medical coding standards: ICD-9/10, CPT, NDC, ATC, RXNorm, CCS
Automatic code translation between different ontology systems
Native understanding of patient timelines and visit sequences
Clinical safety tools: drug-drug interaction checking, model calibration

Healthcare datasets out-of-the-box:

Supported Datasets#
Dataset	Description	Modality
MIMIC-III/IV	Critical care database (300K+ patients)	EHR, Text, Images
eICU	Multi-center ICU database (200K+ stays)	EHR
OMOP-CDM	Standardized healthcare data format	EHR
EHRShot	Few-shot benchmarking (15 tasks)	EHR
COVID19-CXR	COVID-19 chest X-rays	Images
SleepEDF, SHHS, ISRUC	Sleep studies with EEG	Biosignals
ClinVar, COSMIC, TCGA	Genomics and cancer mutations	Genomics

State-of-the-art model library#

Access 33+ pre-built models from recent research papers:

Healthcare-specific models:

RETAIN (2016): Interpretable attention for clinical decisions
StageNet (2020): Disease progression stage modeling
SafeDrug (2021): Safe drug combinations with molecular graphs
GAMENet (2019): Graph-augmented memory for medication recommendation
AdaCare (2020): Adaptive feature extraction for EHR
ConCare (2020): Context-aware patient representation
GRASP (2021): Graph neural networks for patient similarity
MoleRec (2023): Molecular substructure-aware recommendations

Foundation models:

Transformers, RNN/LSTM/GRU, CNN, TCN, MLP
Pre-trained vision models (ResNet, ViT) via torchvision
Pre-trained language models (BERT, ClinicalBERT) via HuggingFace

Specialized models:

ContraWR (2021): Contrastive learning for biosignals (EEG, ECG)
SparcNet (2023): Sparse CNNs for seizure detection and sleep staging
Deepr (2017): CNNs optimized for medical records
Dr. Agent (2020): Multi-agent reinforcement learning for clinical decisions

Production-ready evaluation tools#

Go beyond standard metrics with comprehensive model assessment:

Interpretability methods:

Gradient-based: Integrated Gradients, DeepLift, Saliency Maps, GIM
Perturbation-based: LIME, SHAP (with healthcare-optimized implementations)
Attention-based: Chefer relevance propagation and attention rollout for transformers
Visualization tools for clinical decision support

Uncertainty quantification:

Probability calibration: Temperature scaling, histogram binning, Dirichlet calibration, KCal
Conformal prediction: LABEL, SCRIB, FavMac, with covariate shift support
Statistical coverage guarantees for high-stakes clinical decisions

Clinical metrics:

Drug-drug interaction (DDI) rates
Clinical accuracy metrics
Fairness and bias assessment
Healthcare-specific performance measures

All integrated in one unified interface for comprehensive model evaluation and validation.

Common Use Cases#

PyHealth excels at these healthcare AI applications:

Clinical prediction tasks#

Mortality prediction: ICU and hospital mortality risk assessment
Readmission prediction: 30-day and general readmission risk
Length of stay: Hospital and ICU duration prediction
Disease progression: Track patient condition changes over time

Drug and treatment recommendation#

Medication recommendation: Suggest optimal drug combinations
Drug safety: Identify dangerous drug-drug interactions
Treatment optimization: Personalized therapy selection
Dosage prediction: Optimal medication dosing strategies

Medical coding and NLP#

Code translation: Convert between ICD-9/10, CPT, NDC, ATC, RXNorm systems
Code prediction: Auto-suggest medical codes from clinical notes
Specialty classification: Identify medical specialties from transcriptions
Clinical information extraction: Extract structured data from text

Biosignal analysis#

Sleep staging: Automatic sleep phase classification (Wake, N1, N2, N3, REM)
Seizure detection: EEG abnormality identification
Cardiac monitoring: ECG analysis and arrhythmia detection
Heart sound analysis: Phonocardiogram classification for valve diseases

Medical imaging#

Disease classification: Multi-label chest X-ray diagnosis
COVID-19 detection: Pneumonia and COVID classification from X-rays
Integration with vision models: Easy fine-tuning of pre-trained models

Genomics and precision medicine#

Variant pathogenicity: Classify genetic variants (ClinVar)
Cancer mutation analysis: Predict mutation impact (COSMIC)
Survival prediction: Multi-omics cancer survival models (TCGA)

Flexible and Modular Architecture#

PyHealth’s design philosophy: Use only what you need, customize what you want

Modular components:

pyhealth.data: Flexible patient-event data structures (no assumptions on format)
pyhealth.datasets: 15+ datasets with lazy loading and smart caching
pyhealth.tasks: 40+ pre-defined tasks, easily create custom tasks
pyhealth.models: 33+ models, compatible with any PyTorch model
pyhealth.processors: Handle sequences, images, signals, text, and tabular data
pyhealth.metrics: Clinical performance metrics
pyhealth.interpret: Model interpretability methods
pyhealth.calib: Uncertainty quantification and calibration
pyhealth.medcode: Medical coding standard translation

Easy to extend:

Add custom datasets by inheriting BaseDataset (see tutorials/custom_dataset)
Define custom tasks with simple input/output schemas (see tutorials/custom_task)
Use any PyTorch model with PyHealth’s data loaders
Integrate with PyTorch Lightning for distributed training
Compatible with HuggingFace, torchvision, and other ecosystems

Reproducible Research Infrastructure#

PyHealth 2.0 addresses the reproducibility crisis in healthcare AI:

Standardized implementations:

Eliminate “works on my machine” problems with tested, version-controlled components
All models, tasks, and datasets follow peer-reviewed implementations
Extensive documentation with 50+ tutorials and examples

Multi-language support:

RHealth: Brings PyHealth capabilities to R users and bioinformatics community
Expands accessibility beyond Python-centric machine learning

Standards and integration:

Compatible with healthcare data standards: OMOP, FHIR
Integrates seamlessly with PyTorch, PyTorch Lightning, HuggingFace
Works with your existing Python data science stack

Growing Community#

Join our active healthcare AI community:

400+ members in PyHealth Research Initiative pairing researchers with mentors
Active Development: Regular updates with new models, datasets, and features
Research Collaboration: Direct connection to cutting-edge academic research
Industry Partnerships: Collaborations with academic health systems
Open Source: Transparent, auditable, and customizable
Support: Active Discord community and GitHub discussions - Join our Discord

Get Started Today#

Ready to begin? Explore these key resources:

Getting Started with PyHealth — Quickstart guide
Installation — Install PyHealth
Tutorials — Interactive tutorials
Models — Model API docs
Datasets — Datasets
Tasks — Tasks

Jump in with the guides above, or use the navigation on the left for more.