- Martin Mann, Mostafa M Mohamed, Syed M Ali, and Rolf Backofen
Interactive implementations of thermodynamics-based RNA structure and RNA-RNA interaction prediction approaches for example-driven teaching
PLOS Computational Biology, 14 (8), e1006341, 2018. - Martin Raden, Syed M Ali, Omer S Alkhnbashi, Anke Busch, Fabrizio Costa, Jason A Davis, Florian Eggenhofer, Rick Gelhausen, Jens Georg, Steffen Heyne, Michael Hiller, Kousik Kundu, Robert Kleinkauf, Steffen C Lott, Mostafa M Mohamed, Alexander Mattheis, Milad Miladi, Andreas S Richter, Sebastian Will, Joachim Wolff, Patrick R Wright, and Rolf Backofen
Freiburg RNA tools: a central online resource for RNA-focused research and teaching
Nucleic Acids Research, 46(W1), W25-W29, 2018.
Teaching - McCaskill : structure probabilities source at github@BackofenLab/RNA-Playground
John S. McCaskill (1990) introduced an efficient dynamic programming algorithm to compute the partition function $Z=\sum_{P} \exp(-E(P)/RT)$ over all possible nested structures $P$ that can be
formed by a given RNA sequence $S$ with $E(P)$ = energy of structure $P$, $R$ = gas constant, and $T$ = temperature.
Here, we provide a simplified version of the approach using a Nussinov-like energy scoring scheme, i.e. each base pair of a structure contributes a fixed energy term $E_{bp}$ independent of its context. Given this, we populate two dynamic programming tables $Q$ and $Q^{bp}$. $Q_{i,j}$ provides the partition function for subsequence from position $i$ to $j$, while $Q^{bp}_{i,j}$ holds the partition function of the subsequence given the constraint that position $i$ and $j$ form a base pair (or 0 if no base pair is possible). Watson-Crick as well as GU base pairs are considered complementary. The overall partition function is given by $Z = Q_{1,n}$ for a sequence of length $n$.
Given these partition function terms, we can compute base pair probabilities $P^{bp}$ as well as probabilities that a certain subsequence is unpaired $P^{u}$, i.e. its positions are not involved in any base pair. Such probabilities can be visualized using a dotplot. In this matrix-like representation, each value of a cell $i,j$ is represented by an according dot where its size is in relation to the value.
Here, we provide a simplified version of the approach using a Nussinov-like energy scoring scheme, i.e. each base pair of a structure contributes a fixed energy term $E_{bp}$ independent of its context. Given this, we populate two dynamic programming tables $Q$ and $Q^{bp}$. $Q_{i,j}$ provides the partition function for subsequence from position $i$ to $j$, while $Q^{bp}_{i,j}$ holds the partition function of the subsequence given the constraint that position $i$ and $j$ form a base pair (or 0 if no base pair is possible). Watson-Crick as well as GU base pairs are considered complementary. The overall partition function is given by $Z = Q_{1,n}$ for a sequence of length $n$.
Given these partition function terms, we can compute base pair probabilities $P^{bp}$ as well as probabilities that a certain subsequence is unpaired $P^{u}$, i.e. its positions are not involved in any base pair. Such probabilities can be visualized using a dotplot. In this matrix-like representation, each value of a cell $i,j$ is represented by an according dot where its size is in relation to the value.
RNA sequence $S$:
Minimal loop length $l$:
Energy weight of base pair $E_{bp}$:
'Normalized' temperature $RT$:
Partition functions
The following recursions are used to compute the partition functions $Q$ and $Q^{bp}$.
$Q$ | ||
---|---|---|
$Q^{bp}$ | ||
---|---|---|
Base pair probabilities
Given the partition functions $Q$ and $Q^{bp}$ we can now compute the probabilities of individual base pairs $(i,j)$ within the structure ensemble, i.e. $P^{bp}_{i,j} = \sum_{P \ni (i,j)} \exp(-E(P)/RT) / Z$ given by the sum of the Boltzmann probabilities
of all structures that contain the base pair. For its computation, the following recursion is used, which covers both the case that $(i,j)$ is an external base pair as well as that $(i,j)$ is directly enclosed by an outer base pair $(p,q)$.
Base pair probabilities can be used for structure prediction using e.g. a maximum expected accuracy (MEA) approach.
Base pair probabilities can be used for structure prediction using e.g. a maximum expected accuracy (MEA) approach.
$P^{bp}$ | ||
---|---|---|
The base pair probabilities $P^{bp}$ can be visualized by a dotplot, where larger dots represent higher probabilities.
Unpaired probabilities / accessibility
Analogously to base pair probabilities, we can also compute the probability that a given subsequence $S_{i}..S_{j}$ of an RNA sequence is not involved in any intramolecular base pair.
Unpaired probabilities can be interpreted as the accessibility of a given region of an RNA to further structure formation. Thus, it can be used to enhance RNA-RNA interaction prediction approaches, e.g. see our accessibility-based prediction implementation.
Unpaired probabilities can be interpreted as the accessibility of a given region of an RNA to further structure formation. Thus, it can be used to enhance RNA-RNA interaction prediction approaches, e.g. see our accessibility-based prediction implementation.
$P^u$ | ||
---|---|---|
The probabilities of subsequences to be unpaired $P^{u}$ can be visualized by a dotplot, where larger dots represent higher probabilities.