pyhealth.processors.GraphProcessor#
Overview#
Processor that converts medical codes into patient-level PyG subgraphs
using a provided KnowledgeGraph. Registered as "graph" in the
processor registry.
API Reference#
- class pyhealth.processors.graph_processor.Any(*args, **kwargs)[source]#
Bases:
objectSpecial type indicating an unconstrained type.
Any is compatible with every type.
Any assumed to have all methods.
All values assumed to be instances of Any.
Note that all the above statements are true from the point of view of static type checkers. At runtime, Any should not be used with instance checks.
- class pyhealth.processors.graph_processor.FeatureProcessor[source]#
Bases:
ProcessorProcessor for individual fields (features).
Example: Tokenization, image loading, normalization.
- is_token()[source]#
Returns whether the output (in particular, the value tensor) of the processor represents discrete token indices (True) or continuous values (False). This is used to determine whether to apply token-based transformations (e.g. nn.Embedding) or value-based augmentations (e.g. nn.Linear).
- Return type:
- Returns:
True if the output of the processor represents discrete token indices, False otherwise.
- schema()[source]#
Returns the schema of the processed feature. For a processor that emits a single tensor, this should just return [“value”]. For a processor that emits a tuple of tensors, this should return a tuple of the same length as the tuple, with the semantic name of each tensor, such as [“time”, “value”], [“value”, “mask”], etc.
- Typical semantic names include:
“value”: the main processed tensor output of the processor
“time”: the time tensor output of the processor (mostly for StageNet)
“mask”: the mask tensor output of the processor (if applicable)
- dim()[source]#
Number of dimensions (Tensor.dim()) for each output tensor, in the same order as the output tuple.
- spatial()[source]#
Whether each dimension (axis) of the value tensor is spatial (i.e. corresponds to a spatial axis like time, height, width, etc.) or not. This is used to determine how to apply augmentations and other transformations that should only be applied to spatial dimensions.
E.g. for CNN or RNN features, this would help determine which dimensions to apply spatial augmentations to, and which dimensions to treat as channels or features.
- load(path)#
Optional: Load processor state from disk.
- class pyhealth.processors.graph_processor.GraphProcessor(knowledge_graph, num_hops=2, max_nodes=None)[source]#
Bases:
FeatureProcessorProcessor that converts medical codes into patient-level subgraphs.
Takes a list of medical codes from a patient visit, looks them up in a provided KnowledgeGraph, and extracts the relevant k-hop subgraph as a PyG Data object.
This processor enables graph-based models (GraphCare, G-BERT, KAME) to consume standard PyHealth EHR data by bridging medical codes to knowledge graph structures.
- Parameters:
knowledge_graph (KnowledgeGraph) – A KnowledgeGraph instance containing the medical knowledge graph.
num_hops (
int) – Number of hops for subgraph extraction. Default is 2.max_nodes (
Optional[int]) – Maximum number of nodes in the extracted subgraph. If exceeded, nodes are pruned by distance from seeds (seeds are always kept). Default is None (no limit).
Example
>>> from pyhealth.graph import KnowledgeGraph >>> kg = KnowledgeGraph(triples=[ ... ("aspirin", "treats", "headache"), ... ("headache", "symptom_of", "migraine"), ... ]) >>> processor = GraphProcessor(knowledge_graph=kg, num_hops=2) >>> codes = ["aspirin", "headache"] >>> graph = processor.process(codes) >>> print(graph.num_nodes, graph.num_edges)
- Example in task schema:
>>> from pyhealth.graph import KnowledgeGraph >>> kg = KnowledgeGraph(triples="path/to/triples.csv") >>> input_schema = { ... "conditions": ("graph", { ... "knowledge_graph": kg, ... "num_hops": 2, ... "max_nodes": 500, ... }), ... }
- process(value)[source]#
Convert a list of medical codes to a PyG subgraph.
- Parameters:
value (
Any) – A list of medical code strings from a patient visit (e.g., ICD codes, ATC codes, CPT codes). Can also be a list of list of codes (multi-visit), which will be flattened.- Return type:
Data
- Returns:
PyG Data object with subgraph around the patient’s codes, containing: x, edge_index, edge_type, node_ids, seed_mask.
- is_token()[source]#
Graph outputs are not discrete token indices.
- Return type:
- Returns:
False, since graph Data objects are not token-based.
- schema()[source]#
Returns the schema of the processed feature.
- Return type:
- Returns:
Tuple with single element “graph” indicating PyG Data output.
- dim()[source]#
Graph Data objects don’t have a fixed dimensionality.
- Return type:
- Returns:
Tuple with 0 indicating variable structure.
- spatial()[source]#
Graph structures are inherently non-spatial in the grid sense.
- Return type:
- Returns:
Tuple with False.
- load(path)#
Optional: Load processor state from disk.