pyhealth.datasets.SampleBaseDataset#

This class is the basic sample dataset. The basic signal sample dataset and the basic EHR sample dataset are inherited from this class.

class pyhealth.datasets.SampleBaseDataset(samples, dataset_name='', task_name='')[source]#

Bases: Dataset

Sample base dataset class.

This class the takes a list of samples as input (either from BaseDataset.set_task() or user-provided input), and provides a uniform interface for accessing the samples.

Parameters:
  • samples (List[Dict]) – a list of samples, each sample is a dict with patient_id, visit_id, and other task-specific attributes as key.

  • dataset_name – the name of the dataset. Default is None.

  • task_name – the name of the task. Default is None.

get_all_tokens(key, remove_duplicates=True, sort=True)[source]#

Gets all tokens with a specific key in the samples.

Parameters:
  • key (str) – the key of the tokens in the samples.

  • remove_duplicates (bool) – whether to remove duplicates. Default is True.

  • sort (bool) – whether to sort the tokens by alphabet order. Default is True.

Returns:

a list of tokens.

Return type:

tokens