Score: 28.4, Published: 2024-02-11
DOI: 10.1101/2024.02.08.24302375
The underrepresentation of different ancestry groups in large genomic datasets creates difficulties in interpreting the pathogenicity of monogenic variants. Genetic testing for individuals with non-European ancestry results in higher rates of uncertain variants and a greater risk of misclassification. We report a rare variant in the cardiac troponin T gene, TNNT2; NM_001001430.3: c.571-1G>A (rs483352835) identified via research-based whole exome sequencing in two unrelated probands of Oceanian ancestry with cardiac phenotypes. The variant disrupts the canonical splice acceptor site, activating a cryptic acceptor and resulting in an in-frame deletion (p.Gln191del). The variant is rare in gnomAD v4.0.0 (13/780,762; 0.002%), with the highest frequency in South Asians (5/74,486; 0.007%) and has 16 ClinVar assertions (13 diagnostic clinical laboratories classify as variant of uncertain significance). There are at least 28 reported cases, many with Oceanian ancestry and diverse cardiac phenotypes. Indeed, among Oceanian-ancestry-matched datasets, the allele frequency ranges from 2.9-8.8%, and is present in 2/4 (50%) Indigenous Australian alleles in Genome Asia 100K, with one participant being homozygous. With Oceanians deriving greater than 3% of their DNA from archaic genomes, we found c.571-1G>A in Vindija and Altai Neanderthal, but not the Altai Denisovan, suggesting an origin post Neanderthal divergence from modern humans 130-145 thousand years ago. Based on these data, we classify this variant as benign, and conclude it is not a monogenic cause of disease. Even with ongoing efforts to increase representation in genomics, we highlight the need for caution in assuming rarity of genetic variants in largely European datasets. Efforts to enhance diversity in genomic databases remain crucial.
Score: 112.7, Published: 2024-02-07
DOI: 10.1101/2024.01.31.24301497
Abstract/SummaryGenes encoding long non-coding RNAs (lncRNAs) comprise a large fraction of the human genome, yet haploinsufficiency of a lncRNA has not been shown to cause a Mendelian disease. CHASERR is a highly conserved human lncRNA adjacent to CHD2-a coding gene in which de novo loss-of-function variants cause developmental and epileptic encephalopathy. Here we report three unrelated individuals each harboring an ultra-rare heterozygous de novo deletion in the CHASERR locus. We report similarities in severe developmental delay, facial dysmorphisms, and cerebral dysmyelination in these individuals, distinguishing them from the phenotypic spectrum of CHD2 haploinsufficiency. We demonstrate reduced CHASERR mRNA expression and corresponding increased CHD2 mRNA and protein in whole blood and patient-derived cell lines-specifically increased expression of the CHD2 allele in cis with the CHASERR deletion, as predicted from a prior mouse model of Chaserr haploinsufficiency. We show for the first time that de novo structural variants facilitated by Alu-mediated non-allelic homologous recombination led to deletion of a non-coding element (the lncRNA CHASERR) to cause a rare syndromic neurodevelopmental disorder. We also demonstrate that CHD2 has bidirectional dosage sensitivity in human disease. This work highlights the need to carefully evaluate other lncRNAs, particularly those upstream of genes associated with Mendelian disorders.
Score: 10.5, Published: 2024-02-15
DOI: 10.1101/2024.02.14.24302836
We leveraged information from more than 1.2 million participants to investigate the genetics of anxiety disorders across five continental ancestral groups. Ancestry-specific and cross-ancestry genome-wide association studies identified 51 anxiety-associated loci, 39 of which are novel. Additionally, polygenic risk scores derived from individuals of European descent were associated with anxiety in African, Admixed-American, and East Asian groups. The heritability of anxiety was enriched for genes expressed in the limbic system, the cerebral cortex, the cerebellum, the metencephalon, the entorhinal cortex, and the brain stem. Transcriptome- and proteome-wide analyses highlighted 115 genes associated with anxiety through brain-specific and cross-tissue regulation. We also observed global and local genetic correlations with depression, schizophrenia, and bipolar disorder and putative causal relationships with several physical health conditions. Overall, this study expands the knowledge regarding the genetic risk and pathogenesis of anxiety disorders, highlighting the importance of investigating diverse populations and integrating multi-omics information.
Score: 10.5, Published: 2024-02-13
DOI: 10.1101/2024.02.12.24302043
The 313-variant polygenic risk score (PRS313) provides a promising tool for breast cancer risk prediction. However, evaluation of the PRS313 across different European populations which could influence risk estimation has not been performed. Here, we explored the distribution of PRS313 across European populations using genotype data from 94,072 females without breast cancer, of European-ancestry from 21 countries participating in the Breast Cancer Association Consortium (BCAC) and 225,105 female participants from the UK Biobank. The mean PRS313 differed markedly across European countries, being highest in south-eastern Europe and lowest in north-western Europe. Using the overall European PRS313 distribution to categorise individuals leads to overestimation and underestimation of risk in some individuals from south-eastern and north-western countries, respectively. Adjustment for principal components explained most of the observed heterogeneity in mean PRS. Country-specific PRS distributions may be used to calibrate risk categories in individuals from different countries.
Score: 8.7, Published: 2024-02-09
DOI: 10.1101/2024.02.07.24302442
The development of clonal haematopoiesis (CH), the age-related expansion of mutated haematopoietic stem cell (HSC) clones, is influenced by genetic and non-genetic factors. To date, large-scale studies of CH have focused on individuals of European descent, such that the impact of genetic ancestry on CH development remains incompletely understood. Here, we investigate this by studying CH in 136,401 admixed participants from the Mexico City Prospective Study (MCPS) and 419,228 European participants from the UK Biobank (UKB). We observe that CH was significantly less common in MCPS compared to UKB (adjusted odds ratio (OR) = 0.56 [95% Cl = 0.55-0.59], P = 1.60 x 10-206), a difference that persisted when comparing MCPS participants whose genomes were >50% ancestrally Indigenous American to those whose genomes were >50% ancestrally European (adjusted OR = 0.76 [0.70-0.83], P = 1.78 x 10-10). Genome- and exome-wide association analyses in MCPS participants identified two novel loci associated with CH (CSGALNACT1 and DIAPH3), and ancestry-specific variants in the TCL1B locus with opposing effect on DNMT3A-versus non-DNMT3A-CH. Meta-analysis of the MCPS and UKB cohorts identified another five novel loci associated with overall or gene specific CH, including polymorphisms at PAPR11/CCND2, MEIS1 and UBE2G1/SPNS3. Our CH study, the largest in a non-European population to date, demonstrates the profound impact of ancestry on CH development and reveals the power of cross-ancestry comparisons to derive novel insights into CH pathogenesis and advance health equity amongst different human populations.
Score: 8.3, Published: 2024-01-20
DOI: 10.1101/2024.01.18.24301478
Understanding the temporal and spatial brain locations etiological for psychiatric disorders is essential for targeted neurobiological research. Integration of genomic insights from genome-wide association studies with single-cell transcriptomics is a powerful approach although past efforts have necessarily relied on mouse atlases. Leveraging a comprehensive atlas of the adult human brain, we prioritized cell types via the enrichment of SNP-heritabilities for brain diseases, disorders, and traits, progressing from individual cell types to brain regions. Our findings highlight specific neuronal clusters significantly enriched for the SNP-heritabilities for schizophrenia, bipolar disorder, and major depressive disorder along with intelligence, education, and neuroticism. Extrapolation of cell-type results to brain regions reveals important patterns for schizophrenia with distinct subregions in the hippocampus and amygdala exhibiting the highest significance. Cerebral cortical regions display similar enrichments despite the known prefrontal dysfunction in those with schizophrenia highlighting the importance of subcortical connectivity. Using functional MRI connectivity from cases with schizophrenia and neurotypical controls, we identified brain networks that distinguished cases from controls that also confirmed involvement of the central and lateral amygdala, hippocampal body, and prefrontal cortex. Our findings underscore the value of single-cell transcriptomics in decoding the polygenicity of psychiatric disorders and offer a promising convergence of genomic, transcriptomic, and brain imaging modalities toward common biological targets.
Score: 7.6, Published: 2024-02-15
DOI: 10.1101/2024.02.14.24302815
Dyslexia is a common condition that impacts reading ability. Identifying affected brain networks has been hampered by limited sample sizes of imaging case-control studies. We focused instead on brain structural correlates of genetic disposition to dyslexia in large-scale population data. In over 30,000 adults (UK Biobank), higher polygenic disposition to dyslexia was associated with lower head and brain size, and especially reduced volume and/or altered fiber density in networks involved in motor control, language and vision. However, individual genetic variants disposing to dyslexia often had quite distinct patterns of association with brain structural features. Independent component analysis applied to brain-wide association maps for thousands of dyslexia-disposing genetic variants revealed multiple impact modes on the brain, that corresponded to anatomically distinct areas with their own genomic profiles of association. Polygenic scores for dyslexia-related cognitive and educational measures, as well as attention-deficit/hyperactivity disorder, showed similarities to dyslexia polygenic disposition in terms of brain-wide associations, with microstructure of the internal capsule consistently implicated. In contrast, lower volume of the primary motor cortex was only associated with higher dyslexia polygenic disposition among all traits. These findings robustly reveal heterogeneous neurobiological aspects of dyslexia genetic disposition, and whether they are shared or unique with respect to other genetically correlated traits.
Score: 6.0, Published: 2024-02-13
DOI: 10.1101/2024.02.11.24302643
Large-scale genomic studies have significantly increased our knowledge of genetic variability across populations. Regional genetic profiling is essential for distinguishing common benign variants from those associated with disease. To this end, we conducted a comprehensive characterization of variants in the population of Navarre (Spain), utilizing whole genome sequencing data from 358 unrelated individuals of Spanish origin. Our analysis revealed 61,410 biallelic single nucleotide variants (SNV) within the Navarrese cohort, with 35% classified as common using a minor allele frequency (MAF) > 1%. By comparing allele frequency data from 1000 Genome Project Phase 3 (excluding the Iberian cohort of Spain, IBS), Genome Aggregation Database, and a Spanish cohort including IBS individuals as well as data from Medical Genome Project, we identified 1,069 SNVs common in Navarre but rare (MAF [≤] 1%) in all other populations. This observation was further corroborated by a second regional cohort of 239 unrelated exomes, which confirmed 676 of the 1,069 SNVs as common in Navarre. In conclusion, this study highlights the importance of population-specific characterization of genetic variation to improve allele frequency filtering in sequencing data analysis to identify disease-causing variants.
Score: 4.7, Published: 2024-02-15
DOI: 10.1101/2024.02.14.24302694
Over the last decade, a plethora of blood-based DNA methylation biomarkers have been developed to track differences in ageing, lifestyle, health, and biological outcomes. Typically, penalised regression models are used to generate these predictors, with hundreds or thousands of CpGs included as potential features. However, in such ultra high-dimensional settings, the effectiveness of these methods may be reduced. Here, we introduce Related Trait-based Feature Screening (RTFS), a method for performing CpG pre-selection for incident disease prediction models by utilising associations between CpGs and health-related continuous traits. In a comparison with commonly used CpG pre-selection methods, we evaluate resulting downstream Cox proportional-hazards prediction models for 10-year type 2 diabetes (T2D) onset risk in Generation Scotland (n=18,414). The top performing models utilised incident T2D EWAS (AUC=0.881, PRAUC=0.279) and RTFS (AUC=0.877, PRAUC=0.277). The resulting models also improve prediction over a model using standard risk factors only (AUC=0.841, PRAUC=0.194) and replication was observed in the German-based KORA study (n=4,261) RTFS is a flexible and generalisable framework that can help to refine biomarker development for incident disease outcomes.
Score: 1.5, Published: 2024-02-13
DOI: 10.1101/2024.02.11.24302646
Spinal muscular atrophy (SMA) is a genetic disorder that causes progressive degeneration of lower motor neurons and the subsequent loss of muscle function throughout the body. It is the second most common recessive disorder in individuals of European descent and is present in all populations. Accurate tools exist for diagnosing SMA from short read and long read genome sequencing data. However, there are no publicly available tools for GRCh38-aligned data from panel or exome sequencing assays which continue to be used as first line tests for neuromuscular disorders. We therefore developed and extensively validated a new tool - SMA Finder - that can diagnose SMA not only in genome, but also exome and targeted sequencing samples aligned to GRCh37, GRCh38, or T2T-CHM13. It works by evaluating aligned reads that overlap the c.840 position of SMN1 and SMN2 in order to detect the most common molecular causes of SMA. We applied SMA Finder to 16,626 exomes and 3,911 genomes from heterogeneous rare disease cohorts sequenced at the Broad Institute Center for Mendelian Genomics as well as 1,157 exomes and 8,766 targeted sequencing samples from Tartu University Hospital. SMA Finder correctly identified all 16 known SMA cases and reported nine novel diagnoses which have since been confirmed by clinical testing, with another four novel diagnoses undergoing validation. Notably, out of the 29 total SMA positive cases, 21 had an initial clinical diagnosis of muscular dystrophy, congenital myasthenic syndrome, or congenital myopathy. This underscored the frequency with which SMA can be misdiagnosed as other neuromuscular disorders and confirmed the utility of using SMA Finder to reanalyze phenotypically diverse neuromuscular disease cohorts. Finally, we evaluated SMA Finder on 198,868 individuals that had both exome and genome sequencing data within the UK Biobank (UKBB) and found that SMA Finders overall false positive rate was less than 1 / 200,000 exome samples, and its positive predictive value (PPV) was 96%. We also observed 100% concordance between UKBB exome and genome calls. This analysis showed that, even though it is located within a segmental duplication, the most common causal variant for SMA can be detected with comparable accuracy to monogenic disease variants in non-repetitive regions. Additionally, the high PPV demonstrated by SMA Finder, the existence of treatment options for SMA in which early diagnosis is imperative for therapeutic benefit, as well as widespread availability of clinical confirmatory testing for SMA, may warrant the addition of SMN1 to the ACMG list of genes with reportable secondary findings after genome and exome sequencing.