pyhealth.processors.TimeImageProcessor#

Processor for time-aware image data.

class pyhealth.processors.TimeImageProcessor(image_size=224, to_tensor=True, normalize=False, mean=None, std=None, mode=None, max_images=None)[source]#

Bases: TemporalFeatureProcessor

Feature processor that loads images and pairs them with timestamps.

Takes a tuple of (image_paths, time_differences) and returns a tuple of (stacked_image_tensor, timestamp_tensor, “image”) suitable for the unified multimodal embedding model.

The processor sorts images chronologically by timestamp and optionally caps the number of images per patient, keeping the most recent observations.

Input:

image_paths: List[str | Path]
time_diffs: List[float] (e.g., days from first admission)

Processing:

Sort (path, time) pairs chronologically.
Truncate to max_images most recent if set.
Load, resize, and transform each image.
Stack into a single tensor.

Output:

Tuple of (images, timestamps, “image”) where:
- images: torch.Tensor of shape (N, C, H, W)
- timestamps: torch.Tensor of shape (N,)
- “image”: str literal for modality routing

Parameters:

image_size (int) – Resize images to (image_size, image_size). Defaults to 224.
to_tensor (bool) – Whether to convert images to tensors. Defaults to True.
normalize (bool) – Whether to normalize pixel values. Defaults to False.
mean (Optional[List[float]]) – Per-channel means for normalization. Required if normalize is True.
std (Optional[List[float]]) – Per-channel standard deviations for normalization. Required if normalize is True.
mode (Optional[str]) – PIL image mode conversion (e.g., “RGB”, “L”). If None, keeps the original mode. Defaults to None.
max_images (Optional[int]) – Maximum number of images per patient. If a patient has more images, the most recent (by timestamp) are kept. If None, all images are kept. Defaults to None.

Raises:

ValueError – If normalize is True but mean or std is missing.
ValueError – If mean/std are provided but normalize is False.

Example

>>> proc = TimeImageProcessor(
...     image_size=224,
...     normalize=True,
...     mean=[0.485, 0.456, 0.406],
...     std=[0.229, 0.224, 0.225],
... )
>>> paths = ["/data/xray1.png", "/data/xray2.png"]
>>> times = [0.0, 2.5]
>>> images, timestamps, tag = proc.process((paths, times))
>>> images.shape
torch.Size([2, 3, 224, 224])
>>> timestamps
tensor([0.0000, 2.5000])
>>> tag
'image'

fit(samples, field)[source]#

Fit the processor by inferring n_channels from data.

Scans samples to find the first valid entry for the given field and infers the number of image channels from mode.

Parameters:

samples (Iterable[Dict[str, Any]]) – Iterable of sample dictionaries.
field (str) – The field name to extract from samples.

Return type:

None

process(value)[source]#

Process paired image paths and timestamps.

Takes a tuple of (image_paths, time_differences) where each image path corresponds to the time difference at the same index. Images are sorted chronologically. If max_images is set, only the most recent images are kept.

This method is called by SampleBuilder.transform during dataset processing.

Parameters:

value (Tuple[List[Union[str, Path]], List[float]]) –

A tuple of two lists: - image_paths: List of file paths to images. - time_diffs: List of float time differences

from the patient’s first admission (e.g., in days).

Both lists must have the same length.

Returns:

images: Stacked image tensor of shape (N, C, H, W) where N is the number of images.
timestamps: Float tensor of shape (N,) containing the time differences.
tag: The literal string “image” for modality routing in the multimodal embedding model.

Return type:

A tuple of

Raises:

ValueError – If image_paths and time_diffs have different lengths.
ValueError – If image_paths is empty.
FileNotFoundError – If any image file does not exist.

size()[source]#

Return number of image channels.

Mirrors the TimeseriesProcessor.size() pattern. Returns None if fit() or process() has not been called yet.

Return type:: Optional[int]
Returns:: Number of channels, or None if unknown.

modality()[source]#

Medical image → IMAGE modality.

Return type:: ModalityType

value_dim()[source]#

Flattened image size C*H*W (used with CNN encoder). Returns None if fit() has not been called yet.

Return type:: int

process_temporal(value)[source]#

Return dict output for UnifiedMultimodalEmbeddingModel.

Returns:: FloatTensor (N, C, H, W), “time”: FloatTensor (N,)}
Return type:: {“value”

dim()#

Number of dimensions (Tensor.dim()) for each output tensor, in the same order as the output tuple.

Return type:: tuple[int, ...]
Returns:: Tuple of integers corresponding to the number of dimensions of each output tensor.

is_token()#

Returns whether the output (in particular, the value tensor) of the processor represents discrete token indices (True) or continuous values (False). This is used to determine whether to apply token-based transformations (e.g. nn.Embedding) or value-based augmentations (e.g. nn.Linear).

Return type:: bool
Returns:: True if the output of the processor represents discrete token indices, False otherwise.

load(path)#

Optional: Load processor state from disk.

Parameters:: path (str) – File path to load processor state from.
Return type:: None

save(path)#

Optional: Save processor state to disk.

Parameters:: path (str) – File path to save processor state.
Return type:: None

schema()#

Standardised schema: at minimum ('value', 'time').

Return type:: tuple[str, ...]

spatial()#

Whether each dimension (axis) of the value tensor is spatial (i.e. corresponds to a spatial axis like time, height, width, etc.) or not. This is used to determine how to apply augmentations and other transformations that should only be applied to spatial dimensions.

E.g. for CNN or RNN features, this would help determine which dimensions to apply spatial augmentations to, and which dimensions to treat as channels or features.

Return type:: tuple[bool, ...]
Returns:: Tuple of booleans corresponding to whether each axis of the value tensor is spatial or not.