pyhealth.processors.StageNetProcessor#
Processor for StageNet categorical code inputs with coupled value/time data.
- class pyhealth.processors.StageNetProcessor(padding=0)[source]#
Bases:
TemporalFeatureProcessor,TokenProcessorInterfaceFeature processor for StageNet CODE inputs with coupled value/time data.
This processor handles categorical code sequences (flat or nested). For numeric features, use StageNetTensorProcessor instead.
- Input Format (tuple):
(time, values) where: - time: List of scalars [0.0, 2.0, 1.3] or None - values: [“code1”, “code2”] or [[“A”, “B”], [“C”]]
The processor automatically detects: - List of strings -> flat code sequences - List of lists of strings -> nested code sequences
- Parameters:
padding (
int) – Additional padding to add on top of the observed maximum nested sequence length. The actual padding length will be observed_max + padding. This ensures the processor can handle sequences longer than those in the training data. Default: 0 (no extra padding). Only applies to nested sequences.- Returns:
Tuple of (time_tensor, value_tensor) where time_tensor can be None
Examples
>>> # Case 1: Code sequence with time >>> processor = StageNetProcessor() >>> data = ([0.0, 1.5, 2.3], ["code1", "code2", "code3"]) >>> time, values = processor.process(data) >>> values.shape # (3,) - sequence of code indices >>> time.shape # (3,) - time intervals
>>> # Case 2: Nested codes with time (with custom padding for extra capacity) >>> processor = StageNetProcessor(padding=20) >>> data = ([0.0, 1.5], [["A", "B"], ["C"]]) >>> time, values = processor.process(data) >>> values.shape # (2, observed_max + 20) - padded nested sequences >>> time.shape # (2,)
>>> # Case 3: Codes without time >>> data = (None, ["code1", "code2"]) >>> time, values = processor.process(data) >>> values.shape # (2,) >>> time # None
- dim()[source]#
Number of dimensions for each output tensor.
Time tensor is 1D. Value tensor is 1D (flat) or 2D (nested). Must be called after fit().
- value_dim()[source]#
Vocabulary size (used with nn.Embedding in UnifiedMultimodalEmbeddingModel). Must be called after fit().
- Return type:
- process_temporal(value)[source]#
Return dict output for UnifiedMultimodalEmbeddingModel.
Calls the existing process() (backward-compatible tuple) and wraps the result as a dict with ‘value’ and ‘time’ keys.
- Returns:
LongTensor (S,), “time”: FloatTensor (S,) or None}
- Return type:
{“value”
- PAD = 0#
- UNK = 1#
- load(path)#
Optional: Load processor state from disk.