pyhealth.datasets.utils#

Several utility functions.

pyhealth.datasets.utils.hash_str(s)[source]#
pyhealth.datasets.utils.strptime(s)[source]#

Helper function which parses a string to datetime object.

Parameters:

s (str) – str, string to be parsed.

Return type:

Optional[datetime]

Returns:

Optional[datetime], parsed datetime object. If s is nan, return None.

pyhealth.datasets.utils.padyear(year, month='1', day='1')[source]#

Pad a date time year of format ‘YYYY’ to format ‘YYYY-MM-DD’

Parameters:
  • year (str) – str, year to be padded. Must be non-zero value.

  • month – str, month string to be used as padding. Must be in [1, 12]

  • day – str, day string to be used as padding. Must be in [1, 31]

Returns:

str, padded year.

Return type:

padded_date

pyhealth.datasets.utils.flatten_list(l)[source]#

Flattens a list of list.

Parameters:

l (List) – List, the list of list to be flattened.

Return type:

List

Returns:

List, the flattened list.

Examples

>>> flatten_list([[1], [2, 3], [4]])
[1, 2, 3, 4]R
>>> flatten_list([[1], [[2], 3], [4]])
[1, [2], 3, 4]
pyhealth.datasets.utils.list_nested_levels(l)[source]#

Gets all the different nested levels of a list.

Parameters:

l (List) – the list to be checked.

Return type:

Tuple[int]

Returns:

All the different nested levels of the list.

Examples

>>> list_nested_levels([])
(1,)
>>> list_nested_levels([1, 2, 3])
(1,)
>>> list_nested_levels([[]])
(2,)
>>> list_nested_levels([[1, 2, 3], [4, 5, 6]])
(2,)
>>> list_nested_levels([1, [2, 3], 4])
(1, 2)
>>> list_nested_levels([[1, [2, 3], 4]])
(2, 3)
pyhealth.datasets.utils.is_homo_list(l)[source]#

Checks if a list is homogeneous.

Parameters:

l (List) – the list to be checked.

Return type:

bool

Returns:

bool, True if the list is homogeneous, False otherwise.

Examples

>>> is_homo_list([1, 2, 3])
True
>>> is_homo_list([])
True
>>> is_homo_list([1, 2, "3"])
False
>>> is_homo_list([1, 2, 3, [4, 5, 6]])
False
pyhealth.datasets.utils.collate_fn_dict(batch)[source]#
pyhealth.datasets.utils.get_dataloader(dataset, batch_size, shuffle=False)[source]#