pyhealth.models.MoleRec#

The separate callable MoleRecLayer and the complete MoleRec model.

class pyhealth.models.MoleRecLayer(hidden_size, coef=2.5, target_ddi=0.08, GNN_layers=4, dropout=0.5, multiloss_weight=0.05, **kwargs)[source]#

Bases: Module

MoleRec model.

Paper: Nianzu Yang et al. MoleRec: Combinatorial Drug Recommendation with Substructure-Aware Molecular Representation Learning. WWW 2023.

This layer is used in the MoleRec model. But it can also be used as a standalone layer.

Parameters:
  • hidden_size (int) – hidden feature size.

  • coef (float) – coefficient of ddi loss weight annealing. larger coefficient means higher penalty to the drug-drug-interaction. Default is 2.5.

  • target_ddi (float) – DDI acceptance rate. Default is 0.06.

  • GNN_layers (int) – the number of layers of GNNs encoding molecule and substructures. Default is 4.

  • dropout (float) – the dropout ratio of model. Default is 0.7.

  • multiloss_weight (float) – the weight of multilabel_margin_loss for multilabel classification. Value should be set between [0, 1]. Default is 0.05

calc_loss(logits, y_prob, ddi_adj, labels, label_index=None)[source]#
Return type:

Tensor

forward(patient_emb, drugs, average_projection, ddi_adj, substructure_mask, substructure_graph, molecule_graph, mask=None, drug_indexes=None)[source]#

Forward propagation.

Parameters:
  • patient_emb (Tensor) – a tensor of shape [patient, visit, num_substructures], representating the relation between each patient visit and each substructures.

  • drugs (Tensor) – a multihot tensor of shape [patient, num_labels].

  • mask (Optional[tensor]) – an optional tensor of shape [patient, visit] where 1 indicates valid visits and 0 indicates invalid visits.

  • substructure_mask (Tensor) – tensor of shape [num_drugs, num_substructures], representing whether a substructure shows up in one of the molecule of each drug.

  • average_projection (Tensor) – a tensor of shape [num_drugs, num_molecules] representing the average projection for aggregating multiple molecules of the same drug into one vector.

  • substructure_graph (Union[StaticParaDict, Dict[str, Union[int, Tensor]]]) – a dictionary representating a graph batch of all substructures, where each graph is extracted via ‘smiles2graph’ api of ogb library.

  • molecule_graph (Union[StaticParaDict, Dict[str, Union[int, Tensor]]]) – dictionary with same form of substructure_graph, representing the graph batch of all molecules.

  • ddi_adj (Tensor) – an adjacency tensor for drug drug interaction of shape [num_drugs, num_drugs].

  • drug_indexes (Optional[Tensor]) – the index version of drugs (ground truth) of shape [patient, num_labels], padded with -1

Returns:

a scalar tensor representing the loss. y_prob: a tensor of shape [patient, num_labels] representing

the probability of each drug.

Return type:

loss

training: bool#
class pyhealth.models.MoleRec(dataset, embedding_dim=64, hidden_dim=64, num_rnn_layers=1, num_gnn_layers=4, dropout=0.5, **kwargs)[source]#

Bases: BaseModel

MoleRec model.

Paper: Nianzu Yang et al. MoleRec: Combinatorial Drug Recommendation with Substructure-Aware Molecular Representation Learning. WWW 2023.

Note

This model is only for medication prediction which takes conditions and procedures as feature_keys, and drugs as label_key. It only operates on the visit level.

Note

This model only accepts ATC level 3 as medication codes.

Parameters:
  • dataset (SampleEHRDataset) – the dataset to train the model. It is used to query certain information such as the set of all tokens.

  • embedding_dim (int) – the embedding dimension. Default is 128.

  • hidden_dim (int) – the hidden dimension. Default is 128.

  • num_rnn_layers (int) – the number of layers used in RNN. Default is 1.

  • num_gnn_layers (int) – the number of layers used in GNN. Default is 4.

  • dropout (float) – the dropout rate. Default is 0.7.

  • **kwargs – other parameters for the MoleRec layer.

generate_ddi_adj()[source]#

Generates the DDI graph adjacency matrix.

Return type:

FloatTensor

generate_substructure_mask()[source]#
Return type:

Tuple[Tensor, List[str]]

generate_smiles_list()[source]#

Generates the list of SMILES strings.

Return type:

List[List[str]]

generate_average_projection()[source]#
Return type:

Tuple[Tensor, List[str]]

encode_patient(feature_key, raw_values)[source]#
Return type:

Tensor

forward(conditions, procedures, drugs, **kwargs)[source]#

Forward propagation.

Parameters:
  • conditions (List[List[List[str]]]) – a nested list in three levels with shape [patient, visit, condition].

  • procedures (List[List[List[str]]]) – a nested list in three levels with shape [patient, visit, procedure].

  • drugs (List[List[str]]) – a nested list in two levels [patient, drug].

Returns:

loss: a scalar tensor representing the loss. y_prob: a tensor of shape [patient, visit, num_labels]

representing the probability of each drug.

y_true: a tensor of shape [patient, visit, num_labels]

representing the ground truth of each drug.

Return type:

A dictionary with the following keys

training: bool#