pyhealth.metrics.fairness#

pyhealth.metrics.fairness.fairness_metrics_fn(y_true, y_prob, sensitive_attributes, favorable_outcome=1, metrics=None, threshold=0.5)[source]#

Computes metrics for binary classification.

User can specify which metrics to compute by passing a list of metric names. The accepted metric names are:

  • disparate_impact:

  • statistical_parity_difference:

If no metrics are disparate_impact, and statistical_parity_difference are computed by default.

Parameters:
  • y_true (ndarray) – True target values of shape (n_samples,).

  • y_prob (ndarray) – Predicted probabilities of shape (n_samples,).

  • sensitive_attributes (ndarray) – Sensitive attributes of shape (n_samples,) where 1 is the protected group and 0 is the unprotected group.

  • favorable_outcome (int) – Label value which is considered favorable (i.e. “positive”).

  • metrics (Optional[List[str]]) – List of metrics to compute. Default is [“disparate_impact”, “statistical_parity_difference”].

  • threshold (float) – Threshold for binary classification. Default is 0.5.

Return type:

Dict[str, float]

Returns:

Dictionary of metrics whose keys are the metric names and values are

the metric values.

pyhealth.metrics.fairness_utils.disparate_impact(sensitive_attributes, y_pred, favorable_outcome=1, allow_zero_division=False, epsilon=1e-08)[source]#

Computes the disparate impact between the the protected and unprotected group.

disparate_impact = P(y_pred = favorable_outcome | P) / P(y_pred = favorable_outcome | U)

Parameters:
  • sensitive_attributes (ndarray) – Sensitive attributes of shape (n_samples,) where 1 is the protected group and 0 is the unprotected group.

  • y_pred (ndarray) – Predicted target values of shape (n_samples,).

  • favorable_outcome (int) – Label value which is considered favorable (i.e. “positive”).

  • allow_zero_division – If True, use epsilon instead of 0 in the denominator if the denominator is 0. Otherwise, raise a ValueError.

Return type:

float

Returns:

The disparate impact between the protected and unprotected group.

pyhealth.metrics.fairness_utils.statistical_parity_difference(sensitive_attributes, y_pred, favorable_outcome=1)[source]#

Computes the statistical parity difference between the the protected and unprotected group.

statistical_parity_difference = P(y_pred = favorable_outcome | P) - P(y_pred = favorable_outcome | U) :type sensitive_attributes: ndarray :param sensitive_attributes: Sensitive attributes of shape (n_samples,) where 1 is the protected group and 0 is the unprotected group. :type y_pred: ndarray :param y_pred: Predicted target values of shape (n_samples,). :type favorable_outcome: int :param favorable_outcome: Label value which is considered favorable (i.e. “positive”).

Return type:

float

Returns:

The statistical parity difference between the protected and unprotected group.

pyhealth.metrics.fairness_utils.sensitive_attributes_from_patient_ids(dataset, patient_ids, sensitive_attribute, protected_group)[source]#

Returns the desired sensitive attribute array from patient_ids.

Parameters:
  • dataset (BaseEHRDataset) – Dataset object.

  • patient_ids (List[str]) – List of patient IDs.

  • sensitive_attribute (str) – Sensitive attribute to extract.

  • protected_group (str) – Value of the protected group.

Return type:

ndarray

Returns:

Sensitive attribute array of shape (n_samples,).