Why PyHealth?#
PyHealth is the comprehensive Python library for healthcare AI that makes building, testing, and deploying healthcare machine learning models easier than ever before. Whether you’re a researcher, data scientist, or healthcare practitioner, PyHealth provides the tools you need to develop robust healthcare AI applications.
Build healthcare AI pipelines in ~7 lines of code
Note
📄 Read the PyHealth 2.0 paper: PyHealth 2.0: A Comprehensive Open-Source Toolkit for Accessible and Reproducible Clinical Deep Learning
What Makes PyHealth 2.0 Powerful?#
PyHealth provides comprehensive end-to-end capabilities in a single package:
✅ Unified API for all data types (EHR, images, signals, text, genomics)
✅ Single dependency with no environment conflicts
✅ One consistent workflow across all healthcare data modalities
✅ Scales dynamically from laptop to cluster—adapts to your resources
✅ Research to deployment in the same codebase with the same API
PyHealth 2.0 democratizes healthcare AI—making it accessible, reproducible, and deployable for researchers, data scientists, and healthcare practitioners alike.
Key Features#
Dramatically simpler code#
PyHealth 2.0 reduces the complexity of healthcare AI development from hundreds of lines to single digits:
Task |
Pandas |
PyHealth 1.16 |
PyHealth 2.0 |
|---|---|---|---|
Patient data exploration |
16 lines |
14 lines |
10 lines |
Mortality prediction |
51 lines |
24 lines |
7 lines |
Length of stay prediction |
22 lines |
7 lines |
7 lines |
Drug recommendation |
24 lines |
7 lines |
7 lines |
The same 7 lines of code work for any clinical prediction task. Define your task once, and PyHealth’s optimized backend handles all the complex data processing automatically.
Exceptional performance that scales#
PyHealth 2.0 delivers exceptional performance that makes healthcare AI research accessible on standard hardware:
Breakthrough speed improvements:
Up to 39× faster task processing compared to typical pandas-based approaches
Dramatically reduced processing time for common clinical prediction tasks
Optimized data loaders with smart caching and lazy evaluation
Efficient multi-core scaling without memory overflow
Memory efficiency:
Dynamically scales to fit consumer-grade hardware (16GB laptops)
Handles large-scale datasets like MIMIC-IV without requiring workstation-grade resources
Intelligent memory management adapts to available system resources
Enables research on complex healthcare datasets without expensive infrastructure
Note
What this means for researchers: PyHealth 2.0 enables you to run sophisticated healthcare AI analyses on a standard laptop that previously required high-end workstations. The platform adapts to your available resources while maintaining high performance.
Healthcare-specific design#
PyHealth is built specifically for healthcare, not adapted from general ML libraries:
Medical domain features:
Built-in support for medical coding standards: ICD-9/10, CPT, NDC, ATC, RXNorm, CCS
Automatic code translation between different ontology systems
Native understanding of patient timelines and visit sequences
Clinical safety tools: drug-drug interaction checking, model calibration
Healthcare datasets out-of-the-box:
Dataset |
Description |
Modality |
|---|---|---|
MIMIC-III/IV |
Critical care database (300K+ patients) |
EHR, Text, Images |
eICU |
Multi-center ICU database (200K+ stays) |
EHR |
OMOP-CDM |
Standardized healthcare data format |
EHR |
EHRShot |
Few-shot benchmarking (15 tasks) |
EHR |
COVID19-CXR |
COVID-19 chest X-rays |
Images |
SleepEDF, SHHS, ISRUC |
Sleep studies with EEG |
Biosignals |
ClinVar, COSMIC, TCGA |
Genomics and cancer mutations |
Genomics |
State-of-the-art model library#
Access 33+ pre-built models from recent research papers:
Healthcare-specific models:
RETAIN (2016): Interpretable attention for clinical decisions
StageNet (2020): Disease progression stage modeling
SafeDrug (2021): Safe drug combinations with molecular graphs
GAMENet (2019): Graph-augmented memory for medication recommendation
AdaCare (2020): Adaptive feature extraction for EHR
ConCare (2020): Context-aware patient representation
GRASP (2021): Graph neural networks for patient similarity
MoleRec (2023): Molecular substructure-aware recommendations
Foundation models:
Transformers, RNN/LSTM/GRU, CNN, TCN, MLP
Pre-trained vision models (ResNet, ViT) via torchvision
Pre-trained language models (BERT, ClinicalBERT) via HuggingFace
Specialized models:
ContraWR (2021): Contrastive learning for biosignals (EEG, ECG)
SparcNet (2023): Sparse CNNs for seizure detection and sleep staging
Deepr (2017): CNNs optimized for medical records
Dr. Agent (2020): Multi-agent reinforcement learning for clinical decisions
Production-ready evaluation tools#
Go beyond standard metrics with comprehensive model assessment:
Interpretability methods:
Gradient-based: Integrated Gradients, DeepLift, Saliency Maps, GIM
Perturbation-based: LIME, SHAP (with healthcare-optimized implementations)
Attention-based: Chefer relevance propagation for transformers
Visualization tools for clinical decision support
Uncertainty quantification:
Probability calibration: Temperature scaling, histogram binning, Dirichlet calibration, KCal
Conformal prediction: LABEL, SCRIB, FavMac, with covariate shift support
Statistical coverage guarantees for high-stakes clinical decisions
Clinical metrics:
Drug-drug interaction (DDI) rates
Clinical accuracy metrics
Fairness and bias assessment
Healthcare-specific performance measures
All integrated in one unified interface for comprehensive model evaluation and validation.
Common Use Cases#
PyHealth excels at these healthcare AI applications:
Clinical prediction tasks#
Mortality prediction: ICU and hospital mortality risk assessment
Readmission prediction: 30-day and general readmission risk
Length of stay: Hospital and ICU duration prediction
Disease progression: Track patient condition changes over time
Drug and treatment recommendation#
Medication recommendation: Suggest optimal drug combinations
Drug safety: Identify dangerous drug-drug interactions
Treatment optimization: Personalized therapy selection
Dosage prediction: Optimal medication dosing strategies
Medical coding and NLP#
Code translation: Convert between ICD-9/10, CPT, NDC, ATC, RXNorm systems
Code prediction: Auto-suggest medical codes from clinical notes
Specialty classification: Identify medical specialties from transcriptions
Clinical information extraction: Extract structured data from text
Biosignal analysis#
Sleep staging: Automatic sleep phase classification (Wake, N1, N2, N3, REM)
Seizure detection: EEG abnormality identification
Cardiac monitoring: ECG analysis and arrhythmia detection
Heart sound analysis: Phonocardiogram classification for valve diseases
Medical imaging#
Disease classification: Multi-label chest X-ray diagnosis
COVID-19 detection: Pneumonia and COVID classification from X-rays
Integration with vision models: Easy fine-tuning of pre-trained models
Genomics and precision medicine#
Variant pathogenicity: Classify genetic variants (ClinVar)
Cancer mutation analysis: Predict mutation impact (COSMIC)
Survival prediction: Multi-omics cancer survival models (TCGA)
Flexible and Modular Architecture#
PyHealth’s design philosophy: Use only what you need, customize what you want
Modular components:
pyhealth.data: Flexible patient-event data structures (no assumptions on format)
pyhealth.datasets: 15+ datasets with lazy loading and smart caching
pyhealth.tasks: 40+ pre-defined tasks, easily create custom tasks
pyhealth.models: 33+ models, compatible with any PyTorch model
pyhealth.processors: Handle sequences, images, signals, text, and tabular data
pyhealth.metrics: Clinical performance metrics
pyhealth.interpret: Model interpretability methods
pyhealth.calib: Uncertainty quantification and calibration
pyhealth.medcode: Medical coding standard translation
Easy to extend:
Add custom datasets by inheriting
BaseDataset(see tutorials/custom_dataset)Define custom tasks with simple input/output schemas (see tutorials/custom_task)
Use any PyTorch model with PyHealth’s data loaders
Integrate with PyTorch Lightning for distributed training
Compatible with HuggingFace, torchvision, and other ecosystems
Reproducible Research Infrastructure#
PyHealth 2.0 addresses the reproducibility crisis in healthcare AI:
Standardized implementations:
Eliminate “works on my machine” problems with tested, version-controlled components
All models, tasks, and datasets follow peer-reviewed implementations
Extensive documentation with 50+ tutorials and examples
Multi-language support:
RHealth: Brings PyHealth capabilities to R users and bioinformatics community
Expands accessibility beyond Python-centric machine learning
Standards and integration:
Compatible with healthcare data standards: OMOP, FHIR
Integrates seamlessly with PyTorch, PyTorch Lightning, HuggingFace
Works with your existing Python data science stack
Growing Community#
Join our active healthcare AI community:
400+ members in PyHealth Research Initiative pairing researchers with mentors
Active Development: Regular updates with new models, datasets, and features
Research Collaboration: Direct connection to cutting-edge academic research
Industry Partnerships: Collaborations with academic health systems
Open Source: Transparent, auditable, and customizable
Support: Active Discord community and GitHub discussions - Join our Discord
Get Started Today#
Ready to begin? Explore these key resources:
Getting Started with PyHealth — Quickstart guide
Installation — Install PyHealth
Tutorials — Interactive tutorials
Models — Model API docs
Datasets — Datasets
Tasks — Tasks
Jump in with the guides above, or use the navigation on the left for more.