pyhealth.processors.TemporalTimeseriesProcessor#

Temporal-aware wrapper around the classic TimeseriesProcessor that preserves timestamps in a dict output for multimodal temporal alignment.

class pyhealth.processors.TemporalTimeseriesProcessor(sampling_rate=datetime.timedelta(seconds=3600), impute_strategy='forward_fill')[source]#

Bases: TemporalFeatureProcessor

Temporal-aware wrapper around the classic TimeseriesProcessor.

Identical processing to TimeseriesProcessor (uniform resampling + forward-fill imputation), but returns a dict {"value": Tensor, "time": Tensor} instead of a bare tensor, making it compatible with UnifiedMultimodalEmbeddingModel.

Input tuple format:

(timestamps: List[datetime], values: np.ndarray[T, F])

Output dict:

{"value": FloatTensor (S, F), "time": FloatTensor (S,)}S is determined by sampling_rate and the observation window. — time contains hours elapsed from the first observation.

Parameters:
  • sampling_rate (timedelta) – Uniform re-sampling interval. Defaults to 1 hour.

  • impute_strategy (str) – Currently only "forward_fill" is supported.

Example:

proc = TemporalTimeseriesProcessor(sampling_rate=timedelta(hours=2))
from datetime import datetime, timedelta
ts  = [datetime(2023,1,1,0), datetime(2023,1,1,4), datetime(2023,1,1,8)]
val = np.array([[120.0, 80.0], [115.0, 78.0], [118.0, 82.0]])
out = proc.process_temporal((ts, val))
# out["value"].shape  → (5, 2)   ← 5 two-hour steps over 8 h
# out["time"].shape   → (5,)     ← [0., 2., 4., 6., 8.] hours
fit(samples, field)[source]#

Infer feature dimension from the first valid sample.

Return type:

None

process(value)[source]#

Process and return a dict compatible with TemporalFeatureProcessor.

Parameters:

value (Tuple[List[datetime], ndarray]) – (timestamps, values) where timestamps is a list of datetime objects and values is a np.ndarray of shape (T, F) or (T,).

Return type:

dict

Returns:

{"value": FloatTensor (S, F), "time": FloatTensor (S,)}

process_temporal(value)[source]#
Return type:

dict

is_token()[source]#

Returns whether the output (in particular, the value tensor) of the processor represents discrete token indices (True) or continuous values (False). This is used to determine whether to apply token-based transformations (e.g. nn.Embedding) or value-based augmentations (e.g. nn.Linear).

Return type:

bool

Returns:

True if the output of the processor represents discrete token indices, False otherwise.

schema()[source]#

Standardised schema: at minimum ('value', 'time').

Return type:

tuple[str, ...]

dim()[source]#

Number of dimensions (Tensor.dim()) for each output tensor, in the same order as the output tuple.

Return type:

tuple[int, ...]

Returns:

Tuple of integers corresponding to the number of dimensions of each output tensor.

spatial()[source]#

Whether each dimension (axis) of the value tensor is spatial (i.e. corresponds to a spatial axis like time, height, width, etc.) or not. This is used to determine how to apply augmentations and other transformations that should only be applied to spatial dimensions.

E.g. for CNN or RNN features, this would help determine which dimensions to apply spatial augmentations to, and which dimensions to treat as channels or features.

Return type:

tuple[bool, ...]

Returns:

Tuple of booleans corresponding to whether each axis of the value tensor is spatial or not.

modality()[source]#

Continuous vitals / lab timeseries → NUMERIC modality.

Return type:

ModalityType

value_dim()[source]#

Number of features per time-step (used with nn.Linear). Must be called after fit().

Return type:

int

size()[source]#

Alias for value_dim() — mirrors TimeseriesProcessor API.

Return type:

Optional[int]

load(path)#

Optional: Load processor state from disk.

Parameters:

path (str) – File path to load processor state from.

Return type:

None

save(path)#

Optional: Save processor state to disk.

Parameters:

path (str) – File path to save processor state.

Return type:

None