Many functional RNAs adopt structures specific to their function that have been conserved throughout evolution. Knowing the structure of a conserved RNA is an important step towards elucidating a function and mechanism of action. However, predicting a conserved RNA structure remains unreliable, even when using a combination of thermodynamic stability and experimental chemical probing of single RNA sequences.
In a conserved structural RNA, positions that are paired tend to show a pattern of double substitutions that preserve the interaction. For instance, A:U pair could become a G:C pairs in a different species. The availability of extensive comparative sequence alignments has made it possible to use statistical tests to distinguish three different patterns of conservation in an RNA alignment: conserved pairs of positions with co-variability that cannot explained by phylogeny alone (the positive covarying base pairs); conserved pairs with variability but not co-variability (the negative pairs); and pairs that show almost no variability (the “maybe” base pairs).
In a recent PLOS Computational Biology article, Elena Rivas presents a new method to predict conserved RNA structures that uses both the positive and negative covariation information found in RNA alignments. The method uses a layered cascade of probabilistic folding algorithms to incorporate all positive pairs, while preventing all negative pairs from occurring. The algorithm is named CaCoFold, which stands for Cascade variation/covariation Constrained Folding Algorithm.
The positive and negative base pairs that anchor a CaCoFold structure are identified using the method R-scape (RNA structural covariation above phylogenetic expectation) which had been designed by the author and collaborators in prior work (Rivas et al. Nature Methods, 2017, Rivas et al. Bioinformatics, 2020). The positive and negative pairs are selected because they have high specificity (controlled by R-scape) which makes the CaCoFold structural predictions highly reliable.
Covariation can identify complicated structural RNA elements such as pseudoknots or base-triplet interactions, which in general cannot be accommodated by standard folding algorithm. CaCoFold by incorporating subsets of positive base pairs in recursive layers has the freedom to incorporate all positive base pairs in one single arbitrarily complicated structure. The first layer of CaCoFold is intended to describe the main nested secondary structure and uses the RBG (RNA Basic Grammar) probabilistic folding model (Rivas et al., RNA, 2012). The remaining layers (as many as are necessary to incorporate all positive base pairs) use other simpler models. The layered approach makes the algorithm computationally efficient.
The article presents evidence that CaCoFold predictions are consistent with RNA structures modeled from crystallography. It also proposes improved structures for over one hundred RNAs described in Rfam, a major database that collects conserved structural RNAs.