pyhealth.processors.NestedFloatsProcessor#
Processor for nested numerical sequence data without vocabulary.
Handles nested sequences of floats/numerical values where each sample contains a list of visits, and each visit contains a list of values. For example: [[1.5, 2.3], [4.1], [0.9, 1.2, 3.4]]
Supports forward-fill for missing values across time steps.
- class pyhealth.processors.NestedFloatsProcessor(forward_fill=True, padding=0)[source]#
Bases:
FeatureProcessorFeature processor for nested numerical sequences without vocabulary.
Handles nested sequences of floats/numerical values where each sample contains a list of visits, and each visit contains a list of values: [[1.5, 2.3], [4.1], [0.9, 1.2, 3.4]]
The processor: 1. Determines the maximum inner sequence length during fit 2. Optionally applies forward-fill for missing values 3. Returns a 2D tensor of shape (num_visits, max_values_per_visit)
- Parameters:
forward_fill (
bool) – If True, applies forward fill for NaN values across time steps and empty visits. If False, sets null values to 0. Default is True.padding (
int) – Additional padding to add on top of the observed maximum inner sequence length. The actual padding length will be observed_max + padding. This ensures the processor can handle sequences longer than those in the training data. Default: 0 (no extra padding).
Examples
>>> processor = NestedFloatsProcessor() >>> # During fit, determines max inner sequence length >>> samples = [ ... {"values": [[1.0, 2.0], [3.0, 4.0, 5.0]]}, ... {"values": [[6.0]]} ... ] >>> processor.fit(samples, "values") >>> # Process nested sequence (observed_max=3, default padding=0, total=3) >>> result = processor.process([[1.0, 2.0], [3.0]]) >>> result.shape # (2, 3) - 2 visits, padded to observed_max
- process(value)[source]#
Process nested numerical sequence with optional forward fill.
For missing values (None or empty visits): - If forward_fill=True: uses forward fill from last valid visit - If forward_fill=False: sets null values to 0.0 (for masking)
- schema()[source]#
Returns the schema of the processed feature. For a processor that emits a single tensor, this should just return [“value”]. For a processor that emits a tuple of tensors, this should return a tuple of the same length as the tuple, with the semantic name of each tensor, such as [“time”, “value”], [“value”, “mask”], etc.
- Typical semantic names include:
“value”: the main processed tensor output of the processor
“time”: the time tensor output of the processor (mostly for StageNet)
“mask”: the mask tensor output of the processor (if applicable)
- load(path)#
Optional: Load processor state from disk.
- save(path)#
Optional: Save processor state to disk.
- spatial()[source]#
Whether each dimension (axis) of the value tensor is spatial (i.e. corresponds to a spatial axis like time, height, width, etc.) or not. This is used to determine how to apply augmentations and other transformations that should only be applied to spatial dimensions.
E.g. for CNN or RNN features, this would help determine which dimensions to apply spatial augmentations to, and which dimensions to treat as channels or features.