pyhealth.graph.KnowledgeGraph#

Overview#

Knowledge graph data structure for healthcare code systems. Stores (head, relation, tail) triples and provides k-hop subgraph extraction for patient-level graph construction.

API Reference#

class pyhealth.graph.knowledge_graph.Path(*args, **kwargs)[source]#

Bases: PurePath

PurePath subclass that can make system calls.

Path represents a filesystem path but unlike PurePath, also offers methods to do system calls on path objects. Depending on your system, instantiating a Path will return either a PosixPath or a WindowsPath object. You can also instantiate a PosixPath or WindowsPath directly, but cannot instantiate a WindowsPath on a POSIX system or vice versa.

stat(*, follow_symlinks=True)[source]#

Return the result of the stat() system call on this path, like os.stat() does.

lstat()[source]#

Like stat(), except if the path points to a symlink, the symlink’s status information is returned, rather than its target’s.

exists(*, follow_symlinks=True)[source]#

Whether this path exists.

This method normally follows symlinks; to check whether a symlink exists, add the argument follow_symlinks=False.

is_dir()[source]#

Whether this path is a directory.

is_file()[source]#

Whether this path is a regular file (also True for symlinks pointing to regular files).

is_mount()[source]#

Check if this path is a mount point

Whether this path is a symbolic link.

is_junction()[source]#

Whether this path is a junction.

is_block_device()[source]#

Whether this path is a block device.

is_char_device()[source]#

Whether this path is a character device.

is_fifo()[source]#

Whether this path is a FIFO.

is_socket()[source]#

Whether this path is a socket.

samefile(other_path)[source]#

Return whether other_path is the same or not as this file (as returned by os.path.samefile()).

open(mode='r', buffering=-1, encoding=None, errors=None, newline=None)[source]#

Open the file pointed to by this path and return a file object, as the built-in open() function does.

read_bytes()[source]#

Open the file in bytes mode, read it, and close the file.

read_text(encoding=None, errors=None)[source]#

Open the file in text mode, read it, and close the file.

write_bytes(data)[source]#

Open the file in bytes mode, write to it, and close the file.

write_text(data, encoding=None, errors=None, newline=None)[source]#

Open the file in text mode, write to it, and close the file.

iterdir()[source]#

Yield path objects of the directory contents.

The children are yielded in arbitrary order, and the special entries ‘.’ and ‘..’ are not included.

glob(pattern, *, case_sensitive=None)[source]#

Iterate over this subtree and yield all existing files (of any kind, including directories) matching the given relative pattern.

rglob(pattern, *, case_sensitive=None)[source]#

Recursively yield all existing files (of any kind, including directories) matching the given relative pattern, anywhere in this subtree.

walk(top_down=True, on_error=None, follow_symlinks=False)[source]#

Walk the directory tree from this directory, similar to os.walk().

classmethod cwd()[source]#

Return a new path pointing to the current working directory.

classmethod home()[source]#

Return a new path pointing to the user’s home directory (as returned by os.path.expanduser(‘~’)).

absolute()[source]#

Return an absolute version of this path by prepending the current working directory. No normalization or symlink resolution is performed.

Use resolve() to get the canonical path to a file.

resolve(strict=False)[source]#

Make the path absolute, resolving all symlinks on the way and also normalizing it.

owner()[source]#

Return the login name of the file owner.

group()[source]#

Return the group name of the file gid.

Return the path to which the symbolic link points.

touch(mode=438, exist_ok=True)[source]#

Create this file with the given access mode, if it doesn’t exist.

mkdir(mode=511, parents=False, exist_ok=False)[source]#

Create a new directory at this given path.

chmod(mode, *, follow_symlinks=True)[source]#

Change the permissions of the path, like os.chmod().

lchmod(mode)[source]#

Like chmod(), except if the path points to a symlink, the symlink’s permissions are changed, rather than its target’s.

Remove this file or link. If the path is a directory, use rmdir() instead.

rmdir()[source]#

Remove this directory. The directory must be empty.

rename(target)[source]#

Rename this path to the target path.

The target path may be absolute or relative. Relative paths are interpreted relative to the current working directory, not the directory of the Path object.

Returns the new Path instance pointing to the target path.

replace(target)[source]#

Rename this path to the target path, overwriting if that path exists.

The target path may be absolute or relative. Relative paths are interpreted relative to the current working directory, not the directory of the Path object.

Returns the new Path instance pointing to the target path.

Make this path a symlink pointing to the target path. Note the order of arguments (link, target) is the reverse of os.symlink.

Make this path a hard link pointing to the same file as target.

Note the order of arguments (self, target) is the reverse of os.link’s.

expanduser()[source]#

Return a new path with expanded ~ and ~user constructs (as returned by os.path.expanduser)

property anchor#

The concatenation of the drive and root, or ‘’.

as_posix()#

Return the string representation of the path with forward (/) slashes.

as_uri()#

Return the path as a ‘file’ URI.

property drive#

The drive prefix (letter or UNC path), if any.

is_absolute()#

True if the path is absolute (has both a root and, if applicable, a drive).

is_relative_to(other, /, *_deprecated)#

Return True if the path is relative to another path or False.

is_reserved()#

Return True if the path contains one of the special names reserved by the system, if any.

joinpath(*pathsegments)#

Combine this path with one or several arguments, and return a new path representing either a subpath (if all arguments are relative paths) or a totally different path (if one of the arguments is anchored).

match(path_pattern, *, case_sensitive=None)#

Return True if this path matches the given pattern.

property name#

The final path component, if any.

property parent#

The logical parent of the path.

property parents#

A sequence of this path’s logical parents.

property parts#

An object providing sequence-like access to the components in the filesystem path.

relative_to(other, /, *_deprecated, walk_up=False)#

Return the relative path to another path identified by the passed arguments. If the operation is not possible (because this is not related to the other path), raise ValueError.

The walk_up parameter controls whether .. may be used to resolve the path.

property root#

The root of the path, if any.

property stem#

The final path component, minus its last suffix.

property suffix#

The final component’s last suffix, if any.

This includes the leading period. For example: ‘.txt’

property suffixes#

A list of the final component’s suffixes, if any.

These include the leading periods. For example: [‘.tar’, ‘.gz’]

with_name(name)#

Return a new path with the file name changed.

with_segments(*pathsegments)#

Construct a new path object from any number of path-like objects. Subclasses may override this method to customize how new path objects are created from methods like iterdir().

with_stem(stem)#

Return a new path with the stem changed.

with_suffix(suffix)#

Return a new path with the file suffix changed. If the path has no suffix, add given suffix. If the given suffix is an empty string, remove the suffix from the path.

class pyhealth.graph.knowledge_graph.KnowledgeGraph(triples, entity2id=None, relation2id=None, node_features=None)[source]#

Bases: object

A knowledge graph for healthcare code systems.

Stores (head, relation, tail) triples and provides subgraph extraction for patient-level graph construction.

The user provides the KG — PyHealth does not generate it.

Supported input formats:
  • List of (head, relation, tail) string tuples

  • Path to a CSV/TSV file with head, relation, tail columns

Parameters:
  • triples (Union[List[Tuple[str, str, str]], str, Path]) – List of (head, relation, tail) string tuples, OR path to a CSV/TSV file with head/relation/tail columns.

  • entity2id (Optional[Dict[str, int]]) – Optional pre-built entity-to-ID mapping. If None, built automatically from triples.

  • relation2id (Optional[Dict[str, int]]) – Optional pre-built relation-to-ID mapping. If None, built automatically from triples.

  • node_features (Optional[Tensor]) – Optional tensor of shape (num_entities, feat_dim). Pre-computed node embeddings (e.g., from TransE or LLM).

entity2id#

Dict[str, int] mapping entity names to integer IDs.

relation2id#

Dict[str, int] mapping relation names to integer IDs.

id2entity#

Dict[int, str] reverse mapping.

id2relation#

Dict[int, str] reverse mapping.

edge_index#

Tensor of shape (2, num_triples) in PyG COO format.

edge_type#

Tensor of shape (num_triples,) with relation IDs.

num_entities#

Total number of unique entities.

num_relations#

Total number of unique relation types.

num_triples#

Total number of triples (edges).

Example

>>> from pyhealth.graph import KnowledgeGraph
>>> triples = [
...     ("aspirin", "treats", "headache"),
...     ("headache", "symptom_of", "migraine"),
...     ("ibuprofen", "treats", "headache"),
... ]
>>> kg = KnowledgeGraph(triples=triples)
>>> kg.num_entities
4
>>> kg.num_relations
2
>>> kg.stat()
KnowledgeGraph: 4 entities, 2 relations, 3 triples
>>>
>>> # From a CSV file
>>> kg = KnowledgeGraph(triples="path/to/triples.csv")
>>>
>>> # Extract 2-hop subgraph around seed entities
>>> subgraph = kg.subgraph(seed_entities=["aspirin", "headache"], num_hops=2)
property num_entities: int#

Total number of unique entities.

Return type:

int

property num_relations: int#

Total number of unique relation types.

Return type:

int

property num_triples: int#

Total number of triples (edges).

Return type:

int

subgraph(seed_entities, num_hops=2)[source]#

Extract a k-hop subgraph around seed entities.

Uses PyG’s k_hop_subgraph to find all nodes within num_hops of the seed entities, then returns the induced subgraph.

Parameters:
  • seed_entities (List[str]) – List of entity names (e.g., medical codes). Entities not found in the KG are silently skipped.

  • num_hops (int) – Number of hops to expand from seed nodes. Default is 2.

Returns:

  • x: Node features if available, else zeros (num_nodes, 1).

  • edge_index: Subgraph edges, reindexed to [0, num_nodes).

  • edge_type: Relation type for each edge.

  • node_ids: Original entity IDs for mapping back.

  • seed_mask: Boolean mask, True for seed nodes.

Return type:

PyG Data object with

Raises:

ImportError – If torch-geometric is not installed.

has_entity(entity)[source]#

Check if an entity exists in the KG.

Parameters:

entity (str) – Entity name string.

Return type:

bool

Returns:

True if entity is in the KG.

neighbors(entity, num_hops=1)[source]#

Get neighbor entity names within num_hops.

Parameters:
  • entity (str) – Entity name string.

  • num_hops (int) – Number of hops. Default is 1.

Return type:

List[str]

Returns:

List of neighbor entity name strings.

stat()[source]#

Print statistics of the knowledge graph.