Dissecting the contribution of germline genetics to pediatric sarcoma pathogenesis
Amongst pediatric cancers, bone and soft tissue sarcomas are often associated with the poorest prognoses and most limited effective treatment options. Understanding the genetics that predispose to the development of childhood cancers generally, and sarcomas specifically, is critical to discover novel mediators of oncogenesis, better screen children at-risk, and enable earlier diagnoses that could make cure more achievable. Prior research has suggested that while 7-10% of all pediatric cancers arise in part due to pathogenic germline variants in cancer predisposition genes, this rate may be >20% in many sarcomas. Existing work has largely focused on the role of germline single nucleotide variants (SNVs) and small indels in a narrow set of cancer predisposition genes using traditional variant calling methods. Our group and others have advanced the use of new deep learning and structural variant methods for pathogenic germline variant discovery. Recently, there has also been a surge in germline whole-genome sequencing (WGS) data through large pan-cancer sequencing efforts; to this end, we have assembled a cohort of ~1,200 patients with sarcoma that have germline WGS data. The overarching hypothesis of this proposal is that pediatric sarcoma predisposition is driven by known and previously unappreciated germline alterations that coordinate with somatic events, which we will reveal using novel computational approaches that leverage deep learning for focal and structural variant discovery. We propose two hypothesis-driven aims. In our first aim, we will define germline pathogenic variants that contribute to pediatric sarcoma pathogenesis using deep learning methods. We will undertake harmonization, filter out samples with low coverage, use the high-performing DeepVariant algorithm to call germline SNV/ indel variants, and annotate the pathogenicity of variants using the American College of Medical Genetics (ACMG) framework. We will use a gene-based enrichment analysis approach to conduct multiple ancestry-based case-control analyses, utilizing a multi-ethnic cohort of 25,593 cancer-free individuals whose germline whole-exome sequencing (WES) data has been processed through the same pipeline. In our second aim, we will determine the impact of germline structural variants on sarcoma pathogenesis. We will systematically detect copy number variants (CNVs) and other large structural variants in germline samples using the “gnomAD SV” pipeline, utilizing a reference of 14,891 genomes from cancer-free controls for comparison for a gene-based enrichment analysis. We will undertake validation of select enriched genes identified from both aims using additional cohorts of patients with Ewing’s sarcoma and osteosarcoma that we have identified. Once completed, this project will further define cancer-susceptibility genes in patients with sarcoma, and demonstrate that sarcoma germline variants extend beyond SNVs/ indels to large structural variants as well.