pyhealth.metrics.multiclass#

pyhealth.metrics.multiclass.multiclass_metrics_fn(y_true, y_prob, metrics=None, y_predset=None)[source]#

Computes metrics for multiclass classification.

User can specify which metrics to compute by passing a list of metric names. The accepted metric names are:

roc_auc_macro_ovo: area under the receiver operating characteristic curve,
macro averaged over one-vs-one multiclass classification

roc_auc_macro_ovr: area under the receiver operating characteristic curve,
macro averaged over one-vs-rest multiclass classification

roc_auc_weighted_ovo: area under the receiver operating characteristic curve,
weighted averaged over one-vs-one multiclass classification

roc_auc_weighted_ovr: area under the receiver operating characteristic curve,
weighted averaged over one-vs-rest multiclass classification

accuracy: accuracy score

balanced_accuracy: balanced accuracy score (usually used for imbalanced
datasets)

f1_micro: f1 score, micro averaged

f1_macro: f1 score, macro averaged

f1_weighted: f1 score, weighted averaged

jaccard_micro: Jaccard similarity coefficient score, micro averaged

jaccard_macro: Jaccard similarity coefficient score, macro averaged

jaccard_weighted: Jaccard similarity coefficient score, weighted averaged

cohen_kappa: Cohen’s kappa score

brier_top1: brier score between the top prediction and the true label

ECE: Expected Calibration Error (with 20 equal-width bins). Check pyhealth.metrics.calibration.ece_confidence_multiclass().

ECE_adapt: adaptive ECE (with 20 equal-size bins). Check pyhealth.metrics.calibration.ece_confidence_multiclass().

cwECEt: classwise ECE with threshold=min(0.01,1/K). Check pyhealth.metrics.calibration.ece_classwise().

cwECEt_adapt: classwise adaptive ECE with threshold=min(0.01,1/K). Check pyhealth.metrics.calibration.ece_classwise().

The following metrics related to the prediction sets are accepted as well, but will be ignored if y_predset is None:

rejection_rate: Frequency of rejection, where rejection happens when the prediction set has cardinality other than 1. Check pyhealth.metrics.prediction_set.rejection_rate().
set_size: Average size of the prediction sets. Check pyhealth.metrics.prediction_set.size().
miscoverage_ps: Prob(k not in prediction set). Check pyhealth.metrics.prediction_set.miscoverage_ps().
miscoverage_mean_ps: The average (across different classes k) of miscoverage_ps.
miscoverage_overall_ps: Prob(Y not in prediction set). Check pyhealth.metrics.prediction_set.miscoverage_overall_ps().
error_ps: Same as miscoverage_ps, but retricted to un-rejected samples. Check pyhealth.metrics.prediction_set.error_ps().
error_mean_ps: The average (across different classes k) of error_ps.
error_overall_ps: Same as miscoverage_overall_ps, but restricted to un-rejected samples. Check pyhealth.metrics.prediction_set.error_overall_ps().

If no metrics are specified, accuracy, f1_macro, and f1_micro are computed by default.

This function calls sklearn.metrics functions to compute the metrics. For more information on the metrics, please refer to the documentation of the corresponding sklearn.metrics functions.

Parameters:

y_true (ndarray) – True target values of shape (n_samples,).
y_prob (ndarray) – Predicted probabilities of shape (n_samples, n_classes).
metrics (Optional[List[str]]) – List of metrics to compute. Default is [“accuracy”, “f1_macro”, “f1_micro”].

Return type:

Dict[str, float]

Returns:

Dictionary of metrics whose keys are the metric names and values are: the metric values.

Examples

>>> from pyhealth.metrics import multiclass_metrics_fn
>>> y_true = np.array([0, 1, 2, 2])
>>> y_prob = np.array([[0.9,  0.05, 0.05],
...                    [0.05, 0.9,  0.05],
...                    [0.05, 0.05, 0.9],
...                    [0.6,  0.2,  0.2]])
>>> multiclass_metrics_fn(y_true, y_prob, metrics=["accuracy"])
{'accuracy': 0.75}