Origin of Life studies primarily consist of two sets of inferences: bottom-up, which infer plausible scenarios of abiogenesis given our understanding of planetary, geological, and chemical processes, and top-down, which reconstruct the evolutionary history of life for clues to its earliest states. A discontinuity exists between these narratives, however, as deterministic physiochemical processes must give way to historic evolutionary processes before the common ancestry of any of the genetic lineages upon which top-down approaches rely. This gap in our knowledge represents a kind of “dark age” in which evolutionary inference is especially challenging.
We observe that additional discontinuities between the kind of early life predicted by bottom-up and top-down studies can be explained by ecological diversification between these two states. In particular, we note that the UV-protected cold-start origin most consistent with RNA world studies is at odds with the mesophilic, UV-exposed predicted state of LUCA. We propose these can be reconciled by inferring that the earliest living systems were separated in space, time, and ecology from LUCA, and that very early evolution of cellularity would facilitate both the dispersal and natural selection required for this ecological diversification.
In our current work, we attempt to further narrow the interval of this “dark age” by applying improved tools of phylogenetic inference to the evolution of ancient protein classes that underwent divergence and diversification before the Last Universal Common Ancestor (LUCA). Aminoacyl-tRNA synthetase proteins (aaRS) are ideal for this study, as they are represented by two unrelated classes, Class I and Class II, each consisting of ~10 protein families cognate for one of the 20 amino acids. The divergences within each class represent very ancient events in the early history of life, preceding LUCA and possibly in some cases even involved with the evolution of the genetic code itself. In our previous work we have reconstructed pre-LUCA histories of subsets of these protein classes, including TyrRS/TrpRS, IleRS/ValRS, and SerRS/ThrRS protein family pairs. However, reliably recovering the deep evolutionary relationships between all classes remains challenging, despite several previous studies. Large evolutionary distances between these proteins challenge the use of traditional sequence alignment methods, a problem compounded by the abundance of non-homologous structural subunit diversity, which forces misalignment between sets of proteins. We further investigate the deep phylogenetic relationships between aaRS families within each class using (1) decomposition of aligned sequences into homologous blocks, preserving well-aligned regions within subsets of aaRS families, while avoiding misalignment; and (2) expert curated, structure-informed alignment of highly divergent sequences to recover deeper signals of homology. Improved techniques reconstructing the histories of highly divergent pre-LUCA proteins can bring sequence-based investigation of early life even further backwards in time, towards the origins of the universal genotypes and phenotypes that persist in all cells today.