pyhealth.processors.StageNetProcessor#

Processor for StageNet categorical code inputs with coupled value/time data.

class pyhealth.processors.StageNetProcessor(padding=0)[source]#

Bases: TemporalFeatureProcessor, TokenProcessorInterface

Feature processor for StageNet CODE inputs with coupled value/time data.

This processor handles categorical code sequences (flat or nested). For numeric features, use StageNetTensorProcessor instead.

Input Format (tuple):

(time, values) where: - time: List of scalars [0.0, 2.0, 1.3] or None - values: [“code1”, “code2”] or [[“A”, “B”], [“C”]]

The processor automatically detects: - List of strings -> flat code sequences - List of lists of strings -> nested code sequences

Parameters:

padding (int) – Additional padding to add on top of the observed maximum nested sequence length. The actual padding length will be observed_max + padding. This ensures the processor can handle sequences longer than those in the training data. Default: 0 (no extra padding). Only applies to nested sequences.

Returns:

Tuple of (time_tensor, value_tensor) where time_tensor can be None

Examples

>>> # Case 1: Code sequence with time
>>> processor = StageNetProcessor()
>>> data = ([0.0, 1.5, 2.3], ["code1", "code2", "code3"])
>>> time, values = processor.process(data)
>>> values.shape  # (3,) - sequence of code indices
>>> time.shape    # (3,) - time intervals
>>> # Case 2: Nested codes with time (with custom padding for extra capacity)
>>> processor = StageNetProcessor(padding=20)
>>> data = ([0.0, 1.5], [["A", "B"], ["C"]])
>>> time, values = processor.process(data)
>>> values.shape  # (2, observed_max + 20) - padded nested sequences
>>> time.shape    # (2,)
>>> # Case 3: Codes without time
>>> data = (None, ["code1", "code2"])
>>> time, values = processor.process(data)
>>> values.shape  # (2,)
>>> time          # None
fit(samples, field)[source]#

Build vocabulary and determine input structure.

Parameters:
  • samples (Iterable[Dict[str, Any]]) – List of sample dictionaries

  • key – The key in samples that contains tuple (time, values)

Return type:

None

remove(tokens)[source]#

Remove specified vocabularies from the processor.

retain(tokens)[source]#

Retain only the specified vocabularies in the processor.

add(tokens)[source]#

Add specified vocabularies to the processor.

tokens()[source]#

Return the set of tokens in the processor’s vocabulary.

Return type:

set[str]

process(value)[source]#

Process tuple format data into tensors.

Parameters:

value (Tuple[Optional[List], List]) – Tuple of (time, values) where values are codes

Return type:

Tuple[Optional[Tensor], Tensor]

Returns:

Tuple of (time_tensor, value_tensor), time can be None

vocab_size()[source]#

Return the size of the processor’s vocabulary.

Return type:

int

size()[source]#

Return vocabulary size.

Return type:

int

is_token()[source]#

Code indices are discrete token indices.

Return type:

bool

schema()[source]#

Output is a tuple of (time_tensor, value_tensor).

Return type:

tuple[str, ...]

dim()[source]#

Number of dimensions for each output tensor.

Time tensor is 1D. Value tensor is 1D (flat) or 2D (nested). Must be called after fit().

Return type:

tuple[int, ...]

Returns:

(1, 1) for flat codes or (1, 2) for nested codes.

spatial()[source]#

Whether each dimension of the value tensor is spatial.

Return type:

tuple[bool, ...]

modality()[source]#

Discrete EHR codes → CODE modality.

Return type:

ModalityType

value_dim()[source]#

Vocabulary size (used with nn.Embedding in UnifiedMultimodalEmbeddingModel). Must be called after fit().

Return type:

int

process_temporal(value)[source]#

Return dict output for UnifiedMultimodalEmbeddingModel.

Calls the existing process() (backward-compatible tuple) and wraps the result as a dict with ‘value’ and ‘time’ keys.

Returns:

LongTensor (S,), “time”: FloatTensor (S,) or None}

Return type:

{“value”

PAD = 0#
UNK = 1#
load(path)#

Optional: Load processor state from disk.

Parameters:

path (str) – File path to load processor state from.

Return type:

None

save(path)#

Optional: Save processor state to disk.

Parameters:

path (str) – File path to save processor state.

Return type:

None