A.C.Camargo Next Frontiers

Dados do Resumo


Título

Development and validation of an artificial intelligence platform with continuous learning for precision oncology.

Introdução

Homologous Recombination (HR) is a crucial DNA repair mechanism. Tumors with defects in HR-related genes accumulate somatic mutations, making them potential therapeutic targets for drugs that exploit these genetic deficiencies. Although various drugs target HR deficiencies, the underlying mechanisms are not yet fully understood. In this study, we developed a machine learning model to predict homologous recombination deficiency (HRD) based on the integration of four genome-wide mutational signatures. Furthermore, we employed interpretation techniques to understand molecular features and their implications in precision oncology.

Objetivo

To develop a machine learning model capable of predicting HRD across multiple tumor types, identifying key variables that contribute to prediction and exploring molecular markers relevant to HRD.

Métodos

We used XGBoost algorithm to train a model that predicts HRD score based on four mutational signatures associated with homologous recombination pathway deficiencies. The input data consisted of four layers of molecular information – gene expression, copy number variation, methylation, and somatic mutation – obtained from The Cancer Genome Atlas consortium (TCGA). To prevent overfitting, we employed the Boruta feature selection. We further interpreted the most influential features using SHAP (SHapley Additive Explanations) and linked these features to pathways involved in DNA repair. The analysis was conducted on both a combined cohort and tumor-specific models.

Resultados

Our machine learning model accurately predicted HRD score in both the overall cohort and the stratified tumor-type models. Among the most important input features identified by SHAP, gene expression and methylation were prominent. Our findings confirmed known associations, such as the somatic mutation of the TP53 gene being a critical predictor. Additionally, enrichment analysis using the Hallmark pathways (as defined by the Molecular Signatures Database, MSigDB) highlighted that several of the identified genes were important in expected pathways like DNA repair and G2M checkpoint. Notably, several identified genes had not yet been linked to tumor etiology, providing opportunities for further research.

Conclusões

We successfully developed a machine learning model capable of predicting HRD across multiple tumor types. Several key genes with known roles were identified, validating our approach, while others may provide new insights into novel HRD-associated drivers beyond the known HR-related genes.

Financiador do resumo

FAPESP (Fundação de Amparo à Pesquisa do Estado de São Paulo). (2023/06867-8)

Palavras Chave

Mutational Signatures; homologous recombination deficiency; Variable Interpretation

Área

1.Ciência de dados

Autores

Lucas Alexandre Souza Rosa, Alexandre Defelicibus, Luan Vinicius de Carvalho Martins, Renan Valieris, Israel Tojal da Silva