Score: 36.5, Published: 2024-02-12
DOI: 10.1101/2024.02.10.579721
In both statistical genetics and phylogenetics, a major goal is to identify correlations between genetic loci or other aspects of the phenotype or environment and a focal trait. In these two fields, there are sophisticated but disparate statistical traditions aimed at these tasks. The disconnect between their respective approaches is becoming untenable as questions in medicine, conservation biology, and evolutionary biology increasingly rely on integrating data from within and among species, and once-clear conceptual divisions are becoming increasingly blurred. To help bridge this divide, we derive a general model describing the covariance between the genetic contributions to the quantitative phenotypes of different individuals. Taking this approach shows that standard models in both statistical genetics (e.g., Genome-Wide Association Studies; GWAS) and phylogenetic comparative biology (e.g., phylogenetic regression) can be interpreted as special cases of this more general quantitative-genetic model. The fact that these models share the same core architecture means that we can build a unified understanding of the strengths and limitations of different methods for controlling for genetic structure when testing for associations. We develop intuition for why and when spurious correlations may occur using analytical theory and conduct population-genetic and phylogenetic simulations of quantitative traits. The structural similarity of problems in statistical genetics and phylogenetics enables us to take methodological advances from one field and apply them in the other. We demonstrate this by showing how a standard GWAS technique--including both the genetic relatedness matrix (GRM) as well as its leading eigenvectors, corresponding the principal components (PCs) of the genotype matrix, in a regression model--can mitigate spurious correlations in phylogenetic analyses. As a case study of this, we re-examine an analysis testing for co-evolution of expression levels between genes across a fungal phylogeny, and show that including principal components as covariates decreases the false positive rate while simultaneously increasing the true positive rate. More generally, this work provides a foundation for more integrative approaches for understanding the genetic architecture of phenotypes and how evolutionary processes shape it.
Score: 19.4, Published: 2024-02-12
DOI: 10.1101/2024.02.11.579821
The co-visualization of chromatin conformation with 1D omics data is key to the multi-omics driven data analysis of 3D genome organization. Chromatin contact maps are often shown as 2D heatmaps and visually compared to 1D genomic data by simple juxtaposition. While common, this strategy is imprecise, placing the onus on the reader to align features with each other. To remedy this, we developed HiCrayon, an interactive tool that facilitates the integration of 3D chromatin organization maps and 1D datasets. This visualization method integrates data from genomic assays directly into the chromatin contact map by coloring interactions according to 1D signal. HiCrayon is implemented using R shiny and python to create a graphical user interface (GUI) application, available in both web or containerized format to promote accessibility. HiCrayon is implemented in R, and includes a graphical user interface (GUI), as well as a slimmed-down web-based version that lets users quickly produce publication-ready images. We demonstrate the utility of HiCrayon in visualizing the effectiveness of compartment calling and the relationship between ChIP-seq and various features of chromatin organization. We also demonstrate the improved visualization of other 3D genomic phenomena, such as differences between loops associated with CTCF/cohesin vs. those associated with H3K27ac. We then demonstrate HiCrayons visualization of organizational changes that occur during differentiation and use HiCrayon to detect compartment patterns that cannot be assigned to either A or B compartments, revealing a distinct 3rd chromatin compartment. Overall, we demonstrate the utility of co-visualizing 2D chromatin conformation with 1D genomic signals within the same matrix to reveal fundamental aspects of genome organization. Local version: https://github.com/JRowleyLab/HiCrayon Web version: https://jrowleylab.com/HiCrayon
Score: 16.2, Published: 2024-02-15
DOI: 10.1101/2024.02.13.579970
Heterochromatin plays a critical role in regulating gene expression and maintaining genome integrity. While structural and enzymatic components have been linked to heterochromatin establishment, a comprehensive view of the underlying pathways at diverse heterochromatin domains remains elusive. Here, we developed a systematic approach to identify factors involved in heterochromatin silencing at pericentromeres, subtelomeres, and the silent mating type locus in Schizosaccharomyces pombe. Using quantitative measures, iterative genetic screening, and domain-specific heterochromatin reporters, we identified 369 mutants with different degrees of reduced or enhanced silencing. As expected, mutations in the core heterochromatin machinery globally decreased silencing. However, most other mutants exhibited distinct qualitative and quantitative profiles that indicate domain-specific functions. For example, decreased mating type silencing was linked to mutations in heterochromatin maintenance genes, while compromised subtelomere silencing was associated with metabolic pathways. Furthermore, similar phenotypic profiles revealed shared functions for subunits within complexes. We also discovered that the uncharacterized protein Dhm2 plays a crucial role in maintaining constitutive and facultative heterochromatin, while its absence caused phenotypes akin to DNA replication-deficient mutants. Collectively, our systematic approach unveiled a landscape of domain-specific heterochromatin regulators controlling distinct states and identified Dhm2 as a previously unknown factor linked to heterochromatin inheritance and replication fidelity.
Score: 31.8, Published: 2024-01-22
DOI: 10.1101/2024.01.20.576496
The population structure of the malaria parasite Plasmodium falciparum can reveal underlying demographic and adaptive evolutionary processes. Here, we analyse population structure in 4,376 P. falciparum genomes from 21 countries across Africa. We identified a strongly differentiated cluster of parasites, comprising [~]1.2% of samples analysed, geographically distributed over 13 countries across the continent. Members of this cluster, named AF1, carry a genetic background consisting of a large number of highly differentiated variants, rarely observed outside this cluster, at a multitude of genomic loci distributed across most chromosomes. At these loci, the AF1 haplotypes appear to have common ancestry, irrespective of the sampling location; outside the shared loci, however, AF1 members are genetically similar to their sympatric parasites. AF1 parasites sharing up to 23 genomic co-inherited regions were found in all major regions of Africa, at locations over 7,000 km apart. We coined the term cryptotype to describe a complex common background which is geographically widespread, but concealed by genomic regions of local origin. Most AF1 differentiated variants are functionally related, comprising structural variations and single nucleotide polymorphisms in components of the MSP1 complex and several other genes involved in interactions with red blood cells, including invasion and erythrocyte antigen export. We propose that AF1 parasites have adapted to some as yet unidentified evolutionary niche, by acquiring a complex compendium of interacting variants that rarely circulate separately in Africa. As the cryptotype spread across the continent, it appears to have been maintained mostly intact in spite of recombination events, suggesting a selective advantage. It is possible that other cryptotypes circulate in Africa, and new analysis methods may be needed to identify them.
Score: 13.4, Published: 2024-02-12
DOI: 10.1101/2024.02.09.579686
Transposable elements (TEs) are repetitive DNA sequences which create mutations and generate genetic diversity across the tree of life. In amniotic vertebrates, TEs have been mainly studied in mammals and birds, whose genomes generally display low TE diversity. Squamates (Order Squamata; [~]11,000 extant species of lizards and snakes) show as much variation in TE abundance and activity as they do in species and phenotypes. Despite this high TE activity, squamate genomes are remarkably uniform in size. We hypothesize that novel, lineage-specific dynamics have evolved over the course of squamate evolution to constrain genome size across the order. Thus, squamates may represent a prime model for investigations into TE diversity and evolution. To understand the interplay between TEs and host genomes, we analyzed the evolutionary history of the CR1 retrotransposon, a TE family found in most tetrapod genomes. We compared 113 squamate genomes to the genomes of turtles, crocodilians, and birds, and used ancestral state reconstruction to identify shifts in the rate of CR1 copy number evolution across reptiles. We analyzed the repeat landscapes of CR1 in squamate genomes and determined that shifts in the rate of CR1 copy number evolution are associated with lineage-specific variation in CR1 activity. We then used phylogenetic reconstruction of CR1 subfamilies across amniotes to reveal both recent and ancient CR1 subclades across the squamate tree of life. The patterns of CR1 evolution in squamates contrast other amniotes, suggesting key differences in how TEs interact with different host genomes and at different points across evolutionary history.
Score: 9.3, Published: 2024-02-14
DOI: 10.1101/2024.02.13.579671
O_LIWhite oak (Quercus alba) is an abundant forest tree species across eastern North America that is ecologically, culturally, and economically important. C_LIO_LIWe report the first haplotype-resolved chromosome-scale genome assembly of Q. alba and conduct comparative analyses of genome structure and gene content against other published Fagaceae genomes. In addition, we probe the genetic diversity of this widespread species and investigate its phylogenetic relationships with other oaks using whole-genome data. C_LIO_LIOur genome assembly comprises two haplotypes each consisting of 12 chromosomes. We found that the species has high genetic diversity, much of which predates the divergence of Q. alba from other oak species and likely impacts divergence time estimation in Quercus. Our phylogenetic results highlight phylogenetic discordance across the genus and suggest different relationships among North American oaks than have been reported previously. Despite a high preservation of chromosome synteny and genome size across the Quercus phylogeny, certain gene families have undergone rapid changes in size including resistance genes (R genes). C_LIO_LIThe white oak genome represents a major new resource for studying genome diversity and evolution in Quercus and forest trees more generally. Future research will continue to reveal the full scope of genomic diversity across the white oak clade. C_LI
Score: 9.3, Published: 2024-02-12
DOI: 10.1101/2024.02.11.579857
Serotype surveillance of Streptococcus pneumoniae (the pneumococcus) is critical for understanding the effectiveness of current vaccination strategies. However, existing methods for serotyping are limited in their ability to identify co-carriage of multiple pneumococci and detect novel serotypes. To develop a scalable and portable serotyping method that overcomes these challenges, we employed Nanopore Adaptive Sampling (NAS), an on-sequencer enrichment method which selects for target DNA in real-time, for direct detection of S. pneumoniae in complex samples. Whereas NAS targeting the whole S. pneumoniae genome was ineffective in the presence of non-pathogenic streptococci, the method was both specific and sensitive when targeting the capsular biosynthetic locus (CBL), the operon that determines S. pneumoniae serotype. NAS significantly improved coverage and yield of the CBL relative to sequencing without NAS, and accurately quantified the relative prevalence of serotypes in samples representing co-carriage. To maximise the sensitivity of NAS to detect novel serotypes, we developed and benchmarked a new pangenome-graph algorithm, named GNASTy. We show that GNASTy outperforms the current NAS implementation, which is based on linear genome alignment, when a sample contains a serotype absent from the database of targeted sequences. The methods developed in this work provide an improved approach for novel serotype discovery and routine S. pneumoniae surveillance that is fast, accurate and feasible in low resource settings. GNASTy therefore has the potential to increase the density and coverage of global pneumococcal surveillance. One sentence summaryPangenome graph-based Nanopore Adaptive Sampling, presented in our tool GNASTy, is a sensitive, portable and cost-effective method for Streptococcus pneumoniae surveillance.
Score: 8.6, Published: 2024-02-10
DOI: 10.1101/2024.02.09.579496
Antimicrobial resistance (AMR) gene cassettes comprise an AMR gene flanked by short recombination sites (attI x attC or attC x attC). Integrons are genetic elements able to capture, excise, and shuffle these cassettes, providing adaptation on demand, and can be found on both chromosomes and plasmids. Understanding the patterns of integron diversity may help to understand the epidemiology of AMR genes. As a case study, we examined the clinical resistance gene blaGES-5, an integron-associated class A carbapenemase first reported in Greece in 2004 and since observed worldwide, which to our knowledge has not been the subject of a previous global analysis. Using a dataset comprising all NCBI contigs containing blaGES-5 (n = 431), we developed a pangenome graph-based workflow to characterise and cluster the diversity of blaGES-5 -associated integrons. We demonstrate that blaGES-5-associated integrons on plasmids are different to those on chromosomes. Chromosomal integrons were almost all identified in P. aeruginosa ST235, with a consistent gene cassette content and order. We observed instances where insertion sequence IS110 disrupted attC sites, which might immobilise the gene cassettes and explain the conserved integron structure despite the presence of intI1 integrase promoters, which would typically facilitate capture or excision and rearrangement. The plasmid-associated integrons were more diverse in their gene cassette content and order, which could be an indication of greater integrase activity and shuffling of integrons on plasmids.
Score: 7.1, Published: 2024-02-16
DOI: 10.1101/2024.02.13.580169
The RING E3 ubiquitin ligase UHRF1 is an established cofactor for DNA methylation inheritance. Nucleosomal engagement through histone and DNA interactions directs UHRF1 ubiquitin ligase activity toward lysines on histone H3 tails, creating binding sites for DNMT1 through ubiquitin interacting motifs (UIM1 and UIM2). Here, we profile contributions of UHRF1 and DNMT1 to genome-wide DNA methylation inheritance and dissect specific roles for ubiquitin signaling in this process. We reveal DNA methylation maintenance at low-density CpGs is vulnerable to disruption of UHRF1 ubiquitin ligase activity and DNMT1 ubiquitin reading activity through UIM1. Hypomethylation of low-density CpGs in this manner induces formation of partially methylated domains (PMD), a methylation signature observed across human cancers. Furthermore, disrupting DNMT1 UIM2 function abolishes DNA methylation maintenance. Collectively, we show DNMT1-dependent DNA methylation inheritance is a ubiquitin-regulated process and suggest a disrupted UHRF1-DNMT1 ubiquitin signaling axis contributes to the development of PMDs in human cancers.
Score: 6.4, Published: 2024-02-13
DOI: 10.1101/2024.02.09.579678
Inflammatory Bowel Disease (IBD) is a chronic and often debilitating autoinflammatory condition, with an increasing incidence in children. Standard-of-care therapies lead to sustained transmural healing and clinical remission in fewer than one-third of patients. For children, TNF inhibition remains the only FDA-approved biologic therapy, providing an even greater urgency to understanding mechanisms of response. Genome-wide association studies (GWAS) have identified 418 independent genetic risk loci contributing to IBD, yet the majority are noncoding and their mechanisms of action are difficult to decipher. If causal, they likely alter transcription factor (TF) binding and downstream gene expression in particular cell types and contexts. To bridge this knowledge gap, we built a novel resource: multiome-seq (tandem single-nuclei (sn)RNA-seq and chromatin accessibility (snATAC)-seq) of intestinal tissue from pediatric IBD patients, where anti-TNF response was defined by endoscopic healing. From the snATAC-seq data, we generated a first-time atlas of chromatin accessibility (putative regulatory elements) for diverse intestinal cell types in the context of IBD. For cell types/contexts mediating genetic risk, we reasoned that accessible chromatin will co-localize with genetic disease risk loci. We systematically tested for significant co-localization of our chromatin accessibility maps and risk variants for 758 GWAS traits. Globally, genetic risk variants for IBD, autoimmune and inflammatory diseases are enriched in accessible chromatin of immune populations, while other traits (e.g., colorectal cancer, metabolic) are enriched in epithelial and stromal populations. This resource opens new avenues to uncover the complex molecular and cellular mechanisms mediating genetic disease risk.