A team of bioinformaticians at the Université de Montréal (UdeM) report in Nature the discovery of a structural alphabet that can be used to infer the 3D structure of ribonucleic acid (RNA) from sequence data, providing new tools to understand the role of this important class of cellular regulators.
The folding of a single-stranded RNA molecule is determined by the interactions between its constituent nucleotides. The classical approach to RNA modelling suffers from an important limitation: it only takes into account the canonical Watson-Crick interactions A:U and G:C, that is those where the nucleotides are facing each other.
The non-canonical Hoogsteen and sugar interactions, those where the nucleotides are side by side or on top of each other, are not taken into account by conventional modelling algorithms. The result can be incomplete or erroneous models which can mislead researchers.
The attempt to remedy this problem led François Major, principal investigator at the Institute for Research in Immunology and Cancer of the UdeM and professor in the Department of Computer Science and Operations Research and Marc Parisien, a graduate student in his laboratory, to propose a radically different approach to model RNA structure. Their idea: assemble the structure in silico starting from motifs that combine all the possible interactions between a nucleotide and its neighbors.
The researchers implemented a first algorithm, MC-Fold, that systematically assigns the different motifs to each segment of the sequence and selects the most probable pair based on its frequency in known structures. A second algorithm, MC-Sym, then assembles the set of selected motifs, taking into account the constraints that are found in known structures.
"We introduced a new first-order object to represent nucleotide relationships, the nucleotide cyclic motif (NCM). We reasoned that using NCMs could allow us to arrive at better models of the 3D structure of RNA molecules, " explains François Major. "Compared to the thermodynamic approach, our algorithms make less false positives and negatives and predict structures that are closer to the empirical data in the case of sequences for which it is available. The improvement is due to the fact that NCMs incorporate more base-pairing context-dependent information."
The biological importance of RNA and the growing recognition of its therapeutic potential mean that the new modelling algorithms have many applications in biomedical research. For instance, Major and Parisien have shown that these tools can be used to study the biology of RNA viruses such as HIV. They have also used the MC-Fold:MC-Sym pipeline to identify microRNAs, an important class of regulatory molecules which is currently the focus of intense investigation. microRNAs inhibit target genes both efficiently and specifically and are often considered to be the next generation of therapeutic agents. Since microRNAs are notoriously difficult to identify based on sequence alone, the use of RNA modelling algorithms and structural features to do so represents an important breakthrough.
Work in the laboratory of François Major is supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) and the Canadian Institutes of Health Research (CIHR). Marc Parisien holds Ph.D. scholarships from the NSERC, the Fonds québécois de la recherche sur la nature et les technologies (FQRNT) and the UdeM Faculty for Graduate and Postdoctoral Studies. François Major is a member of the Robert-Cedergren Centre at the Université de Montréal.
The MC-Fold and MC-Sym RNA modelling tools are available on the Internet at www.major.iric.ca.
Article: Marc Parisien1, François Major, The MC-Fold and MC-Sym pipeline infers RNA structure from sequence data, Nature 452, 51-55 (6 March 2008) | doi:10.1038/nature06684