Authors: Zhao, Z.; Yang, X.; Miao, J.; Dorn, S.; Barcellos, S.; Fletcher, J.; Lu, Q.

Score: 12.4, Published: 2024-02-14

DOI: 10.1101/2024.02.12.579913

Epidemiologic associations estimated from observational data are often confounded by genetics due to pervasive pleiotropy among complex traits. Many studies either neglect genetic confounding altogether or rely on adjusting for polygenic scores (PGS) in regression analysis. In this study, we unveil that the commonly employed PGS approach is inadequate for removing genetic confounding due to measurement error and model misspecification. To tackle this challenge, we introduce PENGUIN, a principled framework for polygenic genetic confounding control based on variance component estimation. In addition, we present extensions of this approach that can estimate genetically-unconfounded associations using GWAS summary statistics alone as input and between multiple generations of study samples. Through simulations, we demonstrate superior statistical properties of PENGUIN compared to the existing approaches. Applying our method to multiple population cohorts, we reveal and remove substantial genetic confounding in the associations of educational attainment with various complex traits and between parental and offspring education. Our results show that PENGUIN is an effective solution for genetic confounding control in observational data analysis with broad applications in future epidemiologic association studies.

Authors: Fandino, R. A.; Brady, N. K.; Chatterjee, M.; McDonald, J. M. C.; Livraghi, L.; van der burg, K.; Mazo-Vargas, A.; Reed, R. D.

Score: 11.4, Published: 2024-02-12

DOI: 10.1101/2024.02.09.579733

Long non-coding RNAs (lncRNAs) are transcribed elements increasingly recognized for their roles in regulating gene expression. Thus far, however, we have little understanding of how lncRNAs contribute to evolution and adaptation. Here we show that a conserved lncRNA, ivory, is an important color patterning gene in the buckeye butterfly Junonia coenia. ivory overlaps with cortex, a locus linked to multiple cases of crypsis and mimicry in Lepidoptera. Along with the Livraghi et. al companion paper, we argue that ivory, not cortex, is the color pattern gene of interest at this locus. In J. coenia a cluster of cis-regulatory elements (CREs) in the first intron of ivory are genetically associated with natural variation in seasonal color pattern plasticity, and targeted deletions of these CREs phenocopy seasonal phenotypes. Deletions of different ivory CREs produce other distinct phenotypes as well, including loss of melanic eyespot rings, and positive and negative changes in overall wing pigmentation. We show that the color pattern transcription factors Spineless, Bric-a-brac, and Ftz-f1 bind to the ivory promoter during wing pattern development, suggesting that they directly regulate ivory. This case study demonstrates how cis-regulation of a single non-coding RNA can exert diverse and nuanced effects on the evolution and development of color patterns, including modulating seasonally plastic color patterns. SignificanceThe genomic locus hosting the cortex gene has been linked to numerous cases of color pattern adaptation in moths and butterflies, including crypsis, mimicry, and seasonal polyphenism. Here we show in buckeye butterflies that the actual color pattern gene at the cortex locus is an evolutionarily conserved long non-coding RNA (lncRNA), dubbed ivory, that overlaps with cortex. Compared with other wing pattern genes, ivory stands out because of the highly nuanced, quantitative changes in pigmentation that can be achieved by manipulating adjacent cis-regulatory sequences. This study highlights how lncRNAs can be important factors underlying morphological evolution, and emphasizes the importance of considering non-coding transcripts in comparative genomics.

Authors: Tian, S.; Banerjee, T. D.; Wee, J. L. Q.; Wang, Y.; Murugesan, S. N.; Monteiro, A.

Score: 9.4, Published: 2024-02-12

DOI: 10.1101/2024.02.09.579741

A long-standing question in biology is how phenotypic variation is generated from genetic variation. In Lepidoptera (butterflies and moths), an enigmatic genomic region containing the protein-coding gene cortex is repeatedly used to generate intraspecific wing color pattern polymorphisms in multiple species. While emerging evidence suggest that cortex itself is not the effector gene of this locus, the identity of the effector remains unknown. Here, we investigated two deeply conserved miRNAs embedded in the cortex locus, and discovered that one of them, mir-193, promotes melanic wing color. This miRNA is likely derived from an intron of a gigantic long non-coding RNA spanning the entire locus, and it elicits its function by directly repressing multiple pigmentation genes including a well-known melanin pathway gene, ebony. The function of mir-193 is conserved in a nymphalid and a pierid butterfly, belonging two families that diverged 90 Mya. We propose that mir-193 is the effector gene of this hotspot locus underlying a 100-million-year evolution of intraspecific wing color pattern polymorphisms in Lepidoptera. Our results suggest that small non-coding RNAs can drive adaptive evolution in animals.

Authors: Traa, A.; Van Raamsdonk, J.

Score: 23.7, Published: 2024-02-08

DOI: 10.1101/2024.02.07.579360

The dynamic nature of the mitochondrial network is regulated by mitochondrial fission and fusion, allowing for re-organization of mitochondria to adapt to the cells ever-changing needs. As organisms age, mitochondrial fission and fusion become dysregulated and mitochondrial networks become increasingly fragmented. Modulation of mitochondrial dynamics has been shown to affect longevity in fungi, yeast, Drosophila and C. elegans. While disruption of the mitochondrial fission gene drp-1 only mildly increases wild-type lifespan, it drastically increases the already long lifespan of daf-2 insulin/IGF-1 signaling (IIS) mutants. In this work, we determined the conditions required for drp-1 disruption to extend daf-2 longevity and explored the molecular mechanisms involved. We found that knockdown of drp-1 during development is sufficient to extend daf-2 lifespan, while tissue-specific knockdown of drp-1 in neurons, intestine or muscle failed to increase daf-2 longevity. Disruption of other genes involved in mitochondrial fission also increased daf-2 lifespan as did treatment with a number of different RNAi clones that decrease mitochondrial fragmentation. In exploring potential mechanisms involved, we found that deletion of drp-1 increases resistance to chronic stresses and slows physiologic rates in daf-2 worms. In addition, we found that disruption of drp-1 increased mitochondrial and peroxisomal connectedness in daf-2 worms, increased oxidative phosphorylation and ATP levels, and increased mitophagy in daf-2 worms, but did not affect their ROS levels or mitochondrial membrane potential. Overall, this work defined the conditions under which drp-1 disruption increases daf-2 lifespan and has identified multiple changes in daf-2;drp-1 mutants that may contribute to their lifespan extension.

Authors: Shukla, N.; Roelle, S. M.; Snell, J. C.; DelSignore, O.; Bruchez, A. M.; Matreyek, K. A.

Score: 7.2, Published: 2024-02-14

DOI: 10.1101/2024.02.13.580056

Pairwise compatibility between virus and host proteins can dictate the outcome of infection. During transmission, both inter- and intraspecies variabilities in receptor protein sequences can impact cell susceptibility. Many viruses possess mutable viral entry proteins and the patterns of host compatibility can shift as the viral protein sequence changes. This combinatorial sequence space between virus and host is poorly understood, as traditional experimental approaches lack the throughput to simultaneously test all possible combinations of protein sequences. Here, we created a pseudotyped virus infection assay where a multiplexed target-cell library of host receptor variants can be assayed simultaneously using a DNA barcode sequencing readout. We applied this assay to test a panel of 30 ACE2 orthologs or human sequence mutants for infectability by the original SARS-CoV-2 spike protein or the Alpha, Beta, Gamma, Delta, and Omicron BA1 variant spikes. We compared these results to an analysis of the structural shifts that occurred for each variant spikes interface with human ACE2. Mutated residues were directly involved in the largest shifts, although there were also widespread indirect effects altering interface structure. The N501Y substitution in spike conferred a large structural shift for interaction with ACE2, which was partially recreated by indirect distal substitutions in Delta, which does not harbor N501Y. The structural shifts from N501Y greatly influenced the set of animal orthologs the variant spike was capable of interacting with. Out of the thirteen non-human orthologs, ten exhibited unique patterns of variant-specific compatibility, demonstrating that spike sequence changes during human transmission can toggle ACE2 compatibility and potential susceptibility of other animal species, and cumulatively increase overall compatibilities as new variants emerge. These experiments provide a blueprint for similar large-scale assessments of protein compatibility during entry by diverse viruses. This dataset demonstrates the complex compatibility relationships that occur between variable interacting host and virus proteins.

Authors: Ewen-Campen, B.; Perrimon, N.

Score: 5.9, Published: 2024-02-13

DOI: 10.1101/2024.02.13.580041

Despite the deep conservation of the DNA damage response pathway (DDR), cells in different contexts vary widely in their susceptibility to DNA damage and their propensity to undergo apoptosis as a result of genomic lesions. One of the cell signaling pathways implicated in modulating the DDR is the highly conserved Wnt pathway, which is known to promote resistance to DNA damage caused by ionizing radiation in a variety of human cancers. However, the mechanisms linking Wnt signal transduction to the DDR remain unclear. Here, we use a genetically encoded system in Drosophila to reliably induce consistent levels of DNA damage in vivo, and demonstrate that canonical Wnt signaling in the wing imaginal disc buffers cells against apoptosis in the face of DNA double-strand breaks. We show that Wg, the primary Wnt ligand in Drosophila, activates Epidermal Growth Factor Receptor (EGFR) signaling via the ligand-processing protease Rhomboid, which in turn modulates the DDR in a Chk2, p53, and E2F1-dependent manner. These studies provide mechanistic insight into the modulation of the DDR by the Wnt and EGFR pathways in vivo in a highly proliferative tissue. Furthermore, they reveal how the growth and patterning functions of Wnt signaling are coupled with pro-survival, anti-apoptotic activities, thereby facilitating developmental robustness in the face of genomic damage. Author SummaryEctopic activation of the highly conserved Wnt signaling pathway has been previously demonstrated to promote resistance to radiation and chemoradiation therapy in a variety of human cancers, yet the mechanisms by which Wnt modulates the DDR pathway are not clearly established. Furthermore, putative interactions between Wnt signaling and the DDR outside the context of pathological Wnt over-expressing tumors have not been clearly elucidated. Here, we show that, in Drosophila, loss of canonical Wnt signaling during development of the highly proliferative wing imaginal disc sensitizes cells to DNA damage, biasing them towards apoptosis and ultimately disrupting normal wing development. In contrast, ectopic Wnt signaling reduces the level of apoptosis for a given level of DNA damage. Mechanistically, we demonstrate that Wnt signaling acts via Epidermal Growth Factor Receptor (EGFR) signaling, a well characterized pro-survival pathway, by activating the ligand-processing protease Rhomboid, and that this effect requires core DDR components Chk2, p53, and E2F1. Altogether, we show that Wnt signaling can promote developmental robustness by opposing apoptosis in the face of DNA damage, and reveal a mechanism by which Wnt signaling modulates the DDR via EGFR signaling.

Authors: Fernandes, I. K.; Vieira, C. C.; Dias, K. O. d. G.; Fernandes, S. B.

Score: 4.7, Published: 2024-02-12

DOI: 10.1101/2024.02.08.579534

Complementing phenotypic traits and molecular markers with high-dimensional data such as climate and soil information is becoming a common practice in breeding programs. This study explored new ways to integrate non-genetic information in genomic prediction models using machine learning (ML). Using the multi-environment trial data from the Genomes To Fields initiative, different models to predict maize grain yield were adjusted using various inputs: genetic, environmental, or a combination of both, either in an additive (genetic-and-environmental; G+E) or a multiplicative (genotype-by-environment interaction; GEI) manner. When including environmental data, the mean predictive ability of machine learning genomic prediction models increased from 7-9% over the well-established Factor Analytic Multiplicative Mixed Model (FA) among the three cross-validation scenarios evaluated. Moreover, using the G+E model was more advantageous than the GEI model given the superior, or at least comparable, predictive ability, the lower usage of computational memory and time, and the flexibility of accounting for interactions by construction. Our results illustrate the flexibility provided by the ML framework, particularly with feature engineering. We show that the featured engineering stage offers a viable option for envirotyping and generates valuable information for machine learning-based genomic prediction models. Furthermore, we verified that the genotype-by-environment interactions may be considered using tree-based approaches without explicitly including interactions in the model. These findings support the growing interest in merging high-dimensional genotypic and environmental data into predictive modeling. Key messageIncorporating feature-engineered environmental data into machine learning-based genomic prediction models is an efficient approach to model genotype-by-environment interactions.

Authors: Saadat, N.; Colson, R. N.; Grimme, A. L.; Seroussi, U.; Anderson, J. W. T.; Claycomb, J. M.; Wilce, M. C.; McJunkin, K.; Wilce, J. A.; Boag, P. R.

Score: 3.8, Published: 2024-02-13

DOI: 10.1101/2024.02.13.580109

The conserved TRIM-NHL protein, NHL-2, plays a key role in small RNA pathways in Caenorhabditis elegans. NHL-2 has been shown to interact with U-rich RNA through its NHL domain, but the importance to its biological function is unknown. We defined the crystal structure of the NHL domain to 1.4 [A] resolution and identified residues that affect affinity for U-rich RNA. Functional analysis of an NHL-2 RNA-binding loss-of-function mutant demonstrated defects in the heterochronic pathway, suggesting that RNA binding is essential for its role in this miRNA pathway. Processing bodies were enlarged in the NHL-2 RNA-binding mutant, suggesting a defect in mRNA decay. We also identified the eIF4E binding protein IFET-1 as a strong synthetic interactor with NHL-2 and the DEAD box RNA helicase CGH-1 (DDX6), linking NHL-2 function to translation repression. We demonstrated that in the absence of NHL-2, there was an enrichment of miRNA transcripts associated with the miRNA pathway Argonaute proteins ALG-2 and ALG-2. We demonstrate that NHL-2 RNA-binding activity is essential for let-7 family miRNA-mediated translational repression. We conclude that the NHL-2, CGH-1, and IFET-1 regulatory axes work with the core miRISC components to form an effector complex that is required for some, but not all, miRNAs.

Authors: Fanourgakis, G.; Gaspa-Toneu, L.; Komarov, P. A.; Ozonov, E. A.; Smallwood, S. A.; Peters, A. H. F. M.

Score: 21.0, Published: 2024-02-08

DOI: 10.1101/2024.02.06.579069

DNA methylation (DNAme) serves a stable gene regulatory function in somatic cells (1). In the germ line and during early embryogenesis, however, DNAme undergoes global erasure and re-establishment to support germ cell and embryonic development (2). While de novo DNAme acquisition during male germ cell development is essential for setting genomic DNA methylation imprints, other intergenerational roles for paternal DNAme in defining embryonic chromatin after fertilization are unknown. To approach this question, we reduced levels of DNAme in developing male germ cells through conditional gene deletion of the de novo DNA methyltransferases DNMT3A and DNMT3B in undifferentiated spermatogonia. We observed that DNMT3A serves a DNAme maintenance function in undifferentiated spermatogonia while DNMT3B catalyzes de novo DNAme during spermatogonial differentiation. Mutant male germ cells nevertheless completed their differentiation to sperm. Failing de novo DNAme in Dnmt3a/Dnmt3b double deficient spermatogonia is associated with increased nucleosome occupancy in mature sperm, preferentially at sites with higher CpG content, supporting the model that DNAme modulates nucleosome retention in sperm (3). To assess the impact of altered sperm chromatin in the formation of embryonic chromatin, we measured H3K4me3 occupancy at paternal and maternal alleles in 2-cell embryos using a newly developed transposon-based tagging assay for modified chromatin. Our data show that reduced DNAme in sperm renders paternal alleles permissive for H3K4me3 establishment in early embryos, independently of possible paternal inheritance of sperm born H3K4me3. Together, this study provides first evidence that paternally inherited DNAme directs chromatin formation during early embryonic development.

Authors: Kobren, S. N.; Moldovan, M. A.; Reimers, R.; Traviglia, D.; Li, X.; Barnum, D.; Veit, A.; Willett, J.; Berselli, M.; Ronchetti, W.; Sherwood, R.; Krier, J.; Kohane, I. S.; Undiagnosed Diseases Network, ; Sunyaev, S. R.

Score: 2.5, Published: 2024-02-16

DOI: 10.1101/2024.02.13.580158

Genomics for rare disease diagnosis has advanced at a rapid pace due to our ability to perform "N-of-1" analyses on individual patients. The increasing sizes of ultra-rare, "N-of-1" disease cohorts internationally newly enables cohort-wide analyses for new discoveries, but well-calibrated statistical genetics approaches for jointly analyzing these patients are still under development. The Undiagnosed Diseases Network (UDN) brings multiple clinical, research and experimental centers under the same umbrella across the United States to facilitate and scale N-of-1 analyses. Here, we present the first joint analysis of whole genome sequencing data of UDN patients across the network. We apply existing and introduce new, well-calibrated statistical methods for prioritizing disease genes with de novo recurrence and compound heterozygosity. We also detect pathways enriched with candidate and known diagnostic genes. Our computational analysis, coupled with a systematic clinical review, recapitulated known diagnoses and revealed new disease associations. We make our gene-level findings and variant-level information across the cohort available in a public-facing browser (https://dbmi-bgm.github.io/udn-browser/). These results show that N-of-1 efforts should be supplemented by a joint genomic analysis across cohorts.