- Martin Mann, Mostafa M Mohamed, Syed M Ali, and Rolf Backofen

Interactive implementations of thermodynamics-based RNA structure and RNA-RNA interaction prediction approaches for example-driven teaching

PLOS Computational Biology, 14 (8), e1006341, 2018. - Martin Raden, Syed M Ali, Omer S Alkhnbashi, Anke Busch, Fabrizio Costa, Jason A Davis, Florian Eggenhofer, Rick Gelhausen, Jens Georg, Steffen Heyne, Michael Hiller, Kousik Kundu, Robert Kleinkauf, Steffen C Lott, Mostafa M Mohamed, Alexander Mattheis, Milad Miladi, Andreas S Richter, Sebastian Will, Joachim Wolff, Patrick R Wright, and Rolf Backofen

Freiburg RNA tools: a central online resource for RNA-focused research and teaching

Nucleic Acids Research, 46(W1), W25-W29, 2018.

# Teaching - McCaskill : structure probabilities source at github@BackofenLab/RNA-Playground

John S. McCaskill (1990) introduced an efficient dynamic programming algorithm to compute the partition function $Z=\sum_{P} \exp(-E(P)/RT)$ over all possible nested structures $P$ that can be
formed by a given RNA sequence $S$ with $E(P)$ = energy of structure $P$, $R$ = gas constant, and $T$ = temperature.

Here, we provide a simplified version of the approach using a Nussinov-like energy scoring scheme, i.e. each base pair of a structure contributes a fixed energy term $E_{bp}$ independent of its context. Given this, we populate two dynamic programming tables $Q$ and $Q^{bp}$. $Q_{i,j}$ provides the partition function for subsequence from position $i$ to $j$, while $Q^{bp}_{i,j}$ holds the partition function of the subsequence given the constraint that position $i$ and $j$ form a base pair (or 0 if no base pair is possible). Watson-Crick as well as GU base pairs are considered complementary. The overall partition function is given by $Z = Q_{1,n}$ for a sequence of length $n$.

Given these partition function terms, we can compute base pair probabilities $P^{bp}$ as well as probabilities that a certain subsequence is unpaired $P^{u}$, i.e. its positions are not involved in any base pair. Such probabilities can be visualized using a dotplot. In this matrix-like representation, each value of a cell $i,j$ is represented by an according dot where its size is in relation to the value.

Here, we provide a simplified version of the approach using a Nussinov-like energy scoring scheme, i.e. each base pair of a structure contributes a fixed energy term $E_{bp}$ independent of its context. Given this, we populate two dynamic programming tables $Q$ and $Q^{bp}$. $Q_{i,j}$ provides the partition function for subsequence from position $i$ to $j$, while $Q^{bp}_{i,j}$ holds the partition function of the subsequence given the constraint that position $i$ and $j$ form a base pair (or 0 if no base pair is possible). Watson-Crick as well as GU base pairs are considered complementary. The overall partition function is given by $Z = Q_{1,n}$ for a sequence of length $n$.

Given these partition function terms, we can compute base pair probabilities $P^{bp}$ as well as probabilities that a certain subsequence is unpaired $P^{u}$, i.e. its positions are not involved in any base pair. Such probabilities can be visualized using a dotplot. In this matrix-like representation, each value of a cell $i,j$ is represented by an according dot where its size is in relation to the value.

RNA sequence $S$:

Minimal loop length $l$:

Energy weight of base pair $E_{bp}$:

'Normalized' temperature $RT$:

## Partition functions

The following recursions are used to compute the partition functions $Q$ and $Q^{bp}$.

$Q$ | ||
---|---|---|

$Q^{bp}$ | ||
---|---|---|

## Base pair probabilities

Given the partition functions $Q$ and $Q^{bp}$ we can now compute the probabilities of individual base pairs $(i,j)$ within the structure ensemble, i.e. $P^{bp}_{i,j} = \sum_{P \ni (i,j)} \exp(-E(P)/RT) / Z$ given by the sum of the Boltzmann probabilities
of all structures that contain the base pair. For its computation, the following recursion is used, which covers both the case that $(i,j)$ is an external base pair as well as that $(i,j)$ is directly enclosed by an outer base pair $(p,q)$.

Base pair probabilities can be used for structure prediction using e.g. a maximum expected accuracy (MEA) approach.

Base pair probabilities can be used for structure prediction using e.g. a maximum expected accuracy (MEA) approach.

$P^{bp}$ | ||
---|---|---|

The base pair probabilities $P^{bp}$ can be visualized by a dotplot, where larger dots represent higher probabilities.

## Unpaired probabilities / accessibility

Analogously to base pair probabilities, we can also compute the probability that a given subsequence $S_{i}..S_{j}$ of an RNA sequence is

Unpaired probabilities can be interpreted as the accessibility of a given region of an RNA to further structure formation. Thus, it can be used to enhance RNA-RNA interaction prediction approaches, e.g. see our accessibility-based prediction implementation.

*not*involved in any intramolecular base pair.Unpaired probabilities can be interpreted as the accessibility of a given region of an RNA to further structure formation. Thus, it can be used to enhance RNA-RNA interaction prediction approaches, e.g. see our accessibility-based prediction implementation.

$P^u$ | ||
---|---|---|

The probabilities of subsequences to be unpaired $P^{u}$ can be visualized by a dotplot, where larger dots represent higher probabilities.