pyhealth.tasks.CancerMutationBurden#

class pyhealth.tasks.CancerMutationBurden(code_mapping=None)[source]#

Bases: BaseTask

Task for predicting high vs low tumor mutation burden.

This task classifies patients based on their tumor mutation burden (TMB), which is associated with immunotherapy response. TMB is approximated by counting the number of mutated genes.

task_name#

The name of the task.

Type:

str

input_schema#

The input schema specifying required inputs.

Type:

Dict[str, str]

output_schema#

The output schema specifying outputs.

Type:

Dict[str, str]

TMB_THRESHOLD#

Mutation count threshold for high TMB classification.

Type:

int

Note

This is a simplified TMB calculation based on gene count. Clinical TMB is typically measured as mutations per megabase of sequenced DNA.

Examples

>>> from pyhealth.datasets import TCGAPRADDataset
>>> from pyhealth.tasks import CancerMutationBurden
>>> dataset = TCGAPRADDataset(root="/path/to/tcga_prad")
>>> task = CancerMutationBurden()
>>> samples = dataset.set_task(task)
task_name: str = 'CancerMutationBurden'#
input_schema: Dict[str, str] = {'age_at_diagnosis': 'tensor', 'mutations': 'sequence'}#
output_schema: Dict[str, str] = {'high_tmb': 'binary'}#
TMB_THRESHOLD: int = 10#
pre_filter(df)#
Return type:

LazyFrame