pyhealth.tasks.VariantClassificationClinVar#

class pyhealth.tasks.VariantClassificationClinVar(code_mapping=None)[source]#

Bases: BaseTask

Task for classifying variant clinical significance using ClinVar data.

This task predicts the clinical significance of genetic variants (e.g., Pathogenic, Benign, Uncertain significance) based on variant features from the ClinVar database.

task_name#

The name of the task.

Type:

str

input_schema#

The input schema specifying required inputs.

Type:

Dict[str, str]

output_schema#

The output schema specifying outputs.

Type:

Dict[str, str]

CLINICAL_SIGNIFICANCE_CATEGORIES#

Mapping of raw values to standardized clinical significance labels.

Type:

Dict[str, str]

Note

Variants with conflicting interpretations or non-standard clinical significance values are excluded from the output samples.

Examples

>>> from pyhealth.datasets import ClinVarDataset
>>> from pyhealth.tasks import VariantClassificationClinVar
>>> dataset = ClinVarDataset(root="/path/to/clinvar")
>>> task = VariantClassificationClinVar()
>>> samples = dataset.set_task(task)
task_name: str = 'VariantClassificationClinVar'#
input_schema: Dict[str, str] = {'chromosome': 'text', 'gene_symbol': 'text', 'variant_type': 'text'}#
output_schema: Dict[str, str] = {'clinical_significance': 'multiclass'}#
CLINICAL_SIGNIFICANCE_CATEGORIES: Dict[str, str] = {'benign': 'Benign', 'likely benign': 'Likely benign', 'likely pathogenic': 'Likely pathogenic', 'pathogenic': 'Pathogenic', 'uncertain significance': 'Uncertain significance', 'vus': 'Uncertain significance'}#
pre_filter(df)#
Return type:

LazyFrame