pyhealth.datasets.SampleDataset#
This class is the basic sample dataset. All sample datasets are inherited from this class.
- class pyhealth.datasets.SampleDataset(path, dataset_name=None, task_name=None, **kwargs)[source]#
Bases:
StreamingDatasetA streaming dataset that loads sample metadata and processors from disk.
SampleDataset expects the path directory to contain a schema.pkl file created by a SampleBuilder.save(…) call. The schema.pkl must include the fitted input_schema, output_schema, input_processors, output_processors, patient_to_index and record_to_index mappings.
- input_schema#
The configuration used to instantiate processors for input features (string aliases or processor specs).
- output_schema#
The configuration used to instantiate processors for output features.
- input_processors#
A mapping of input feature names to fitted FeatureProcessor instances.
- output_processors#
A mapping of output feature names to fitted FeatureProcessor instances.
- patient_to_index#
Dictionary mapping patient IDs to the list of sample indices associated with that patient.
- record_to_index#
Dictionary mapping record/visit IDs to the list of sample indices associated with that record.
- dataset_name#
Optional human friendly dataset name.
- task_name#
Optional human friendly task name.
- set_drop_last(drop_last)#
Set the drop_last parameter.
Invalidates the shuffler cache when the parameter changes to ensure subsequent length calculations reflect the new drop_last setting.
- set_epoch(current_epoch)#
Set the current epoch to the dataset on epoch starts.
When using the StreamingDataLoader, this is done automatically
- Return type:
- set_shuffle(shuffle)#
Set the shuffle parameter.
Invalidates the shuffler cache when the parameter changes to ensure subsequent length calculations reflect the new shuffle setting.