pyhealth.metrics.multilabel.multilabel_metrics_fn(y_true, y_prob, metrics=None, threshold=0.3, y_predset=None)[source]#

Computes metrics for multilabel classification.

User can specify which metrics to compute by passing a list of metric names. The accepted metric names are:

  • roc_auc_micro: area under the receiver operating characteristic curve, micro averaged

  • roc_auc_macro: area under the receiver operating characteristic curve, macro averaged

  • roc_auc_weighted: area under the receiver operating characteristic curve, weighted averaged

  • roc_auc_samples: area under the receiver operating characteristic curve, samples averaged

  • pr_auc_micro: area under the precision recall curve, micro averaged

  • pr_auc_macro: area under the precision recall curve, macro averaged

  • pr_auc_weighted: area under the precision recall curve, weighted averaged

  • pr_auc_samples: area under the precision recall curve, samples averaged

  • accuracy: accuracy score

  • f1_micro: f1 score, micro averaged

  • f1_macro: f1 score, macro averaged

  • f1_weighted: f1 score, weighted averaged

  • f1_samples: f1 score, samples averaged

  • precision_micro: precision score, micro averaged

  • precision_macro: precision score, macro averaged

  • precision_weighted: precision score, weighted averaged

  • precision_samples: precision score, samples averaged

  • recall_micro: recall score, micro averaged

  • recall_macro: recall score, macro averaged

  • recall_weighted: recall score, weighted averaged

  • recall_samples: recall score, samples averaged

  • jaccard_micro: Jaccard similarity coefficient score, micro averaged

  • jaccard_macro: Jaccard similarity coefficient score, macro averaged

  • jaccard_weighted: Jaccard similarity coefficient score, weighted averaged

  • jaccard_samples: Jaccard similarity coefficient score, samples averaged

  • ddi: drug-drug interaction score (specifically for drug-related tasks, such as drug recommendation)

  • hamming_loss: Hamming loss

  • cwECE: classwise ECE (with 20 equal-width bins). Check pyhealth.metrics.calibration.ece_classwise().

  • cwECE_adapt: classwise adaptive ECE (with 20 equal-size bins). Check pyhealth.metrics.calibration.ece_classwise().

The following metrics related to the prediction sets are accepted as well, but will be ignored if y_predset is None:
  • fp: Number of false positives.

  • tp: Number of true positives.

If no metrics are specified, pr_auc_samples is computed by default.

This function calls sklearn.metrics functions to compute the metrics. For more information on the metrics, please refer to the documentation of the corresponding sklearn.metrics functions.

  • y_true (ndarray) – True target values of shape (n_samples, n_labels).

  • y_prob (ndarray) – Predicted probabilities of shape (n_samples, n_labels).

  • metrics (Optional[List[str]]) – List of metrics to compute. Default is [“pr_auc_samples”].

  • threshold (float) – Threshold to binarize the predicted probabilities. Default is 0.5.

Return type:

Dict[str, float]


Dictionary of metrics whose keys are the metric names and values are

the metric values.


>>> from pyhealth.metrics import multilabel_metrics_fn
>>> y_true = np.array([[0, 1, 1], [1, 0, 1]])
>>> y_prob = np.array([[0.1, 0.9, 0.8], [0.05, 0.95, 0.6]])
>>> multilabel_metrics_fn(y_true, y_prob, metrics=["accuracy"])
{'accuracy': 0.5}