One main focus of our work is on developing computational methods for RNA structure prediction including the identification of new RNA functions. We are also developing computational screens for the identification of evolutionarily conserved RNA structures in different organisms. We test the results of our computational screens in collaboration with other groups.
RNA computational biology has many urgent issues, because of the discovery of large numbers of long noncoding RNAs (lncRNAs), whose functional significance remains unclear. We suspect that lncRNAs are heterogeneous, and that many are transcriptional noise and artifacts, but that a subset are functional and important. We have shown that conserved RNA structure prediction analysis is an important tool in understanding functional RNAs, and distinguishing them from artifacts and noise.
Another important focus is the integration of phylogenetic methods into commonly used homology search and alignment methods. The high level objective of both aspects of my work is to design new algorithms for reliably determining the nature of any piece of biological sequence. The main tool we use to carry this program forward is the use of probabilistic models and rigorous statistical inference.
To improve the identification of RNA function, we follow at least three directions: (1) find better models of RNA structure prediction so that we can incorporate prior information on structure, (2) develop better models of evolution so we can use multiple sequences effectively, and (3) investigate the detection of phylogenetically conserved structures by assessing significant covarying pairs in RNA alignments.