While humans, E. coli bacteria, and archaea look very different on the outside, they all descended from a common ancestor that lived billions of years ago. One sign of this shared heritage is that almost all life uses the same genetic code. The genetic code is the look-up table that cells use to interpret three-nucleotide codons in the genetic material as amino acids in a protein sequence; for example, every UGG codon is read out as a tryptophan in a protein sequence.
At one point, scientists thought that the genetic code was universal in all life and impossible to change. However, since the 1970s, several examples of alternative genetic codes, where the meaning of one or more codons has been altered, have been found in disparate organisms such as green algae, Mycoplasma bacteria, and animal mitochondria. How is it possible to evolve a new amino acid meaning for a codon without messing up the resulting sequence of important proteins?
In our study, we set out to find more examples of alternative genetic codes. Most of the known exceptions were found by anecdote, so we may not have a complete representation of which changes are possible. We built a computer program called Codetta which can scan tens of thousands of genomes and predict the genetic code look-up table used by each one.
Over the past few decades, scientists have sequenced hundreds of thousands of microbial genomes, including species known only from environmental samples. Analyzing this huge diversity of microbial genomes with Codetta, we found five groups of bacteria that use novel genetic codes. Previously, only changes to the meaning of the stop codon UGA were known in bacteria. The new genetic codes included the first known changes to amino acid codons in bacteria, and intriguingly, all of the new changes affected codons for the amino acid arginine.
We wondered why we saw more changes to arginine codons than codons for other amino acids. Four of the bacterial groups with changes to arginine codons had genomes containing a low fraction of guanine (G) and cytosine (C) nucleotides. This ratio may have played a role in making some GC-rich arginine codons very rare and easier to change without affecting many proteins. In another clade of bacteria, an arginine codon was repurposed to become the most used codon for methionine. This change seems to have happened because the tRNA—the molecule that physically matches the codon to an amino acid—mutated to carry methionine instead of arginine.
In addition to finding new alternative genetic codes, Codetta has a practical application in correctly labeling the genetic code of newly sequenced genomes. Most annotations assume that all organisms use the most common genetic code, so checking these assumptions will ensure the accuracy of protein sequence databases.