Freiburg RNA Tools

# Teaching - McCaskill : structure probabilities source at github@BackofenLab/RNA-Playground

John S. McCaskill (1990) introduced an efficient dynamic programming algorithm to compute the partition function $Z=\sum_{P} \exp(-E(P)/RT)$ over all possible nested structures $P$ that can be formed by a given RNA sequence $S$ with $E(P)$ = energy of structure $P$, $R$ = gas constant, and $T$ = temperature.

Here, we provide a simplified version of the approach using a Nussinov-like energy scoring scheme, i.e. each base pair of a structure contributes a fixed energy term $E_{bp}$ independent of its context. Given this, we populate two dynamic programming tables $Q$ and $Q^{bp}$. $Q_{i,j}$ provides the partition function for subsequence from position $i$ to $j$, while $Q^{bp}_{i,j}$ holds the partition function of the subsequence given the constraint that position $i$ and $j$ form a base pair (or 0 if no base pair is possible). Watson-Crick as well as GU base pairs are considered complementary. The overall partition function is given by $Z = Q_{1,n}$ for a sequence of length $n$.

Given these partition function terms, we can compute base pair probabilities $P^{bp}$ as well as probabilities that a certain subsequence is unpaired $P^{u}$, i.e. its positions are not involved in any base pair. Such probabilities can be visualized using a dotplot. In this matrix-like representation, each value of a cell $i,j$ is represented by an according dot where its size is in relation to the value.
RNA sequence $S$:
Minimal loop length $l$:
Energy weight of base pair $E_{bp}$:
'Normalized' temperature $RT$:

## Partition functions

The following recursions are used to compute the partition functions $Q$ and $Q^{bp}$.
$Q$
$Q^{bp}$

## Base pair probabilities

Given the partition functions $Q$ and $Q^{bp}$ we can now compute the probabilities of individual base pairs $(i,j)$ within the structure ensemble, i.e. $P^{bp}_{i,j} = \sum_{P \ni (i,j)} \exp(-E(P)/RT) / Z$ given by the sum of the Boltzmann probabilities of all structures that contain the base pair. For its computation, the following recursion is used, which covers both the case that $(i,j)$ is an external base pair as well as that $(i,j)$ is directly enclosed by an outer base pair $(p,q)$.

Base pair probabilities can be used for structure prediction using e.g. a maximum expected accuracy (MEA) approach.
$P^{bp}$
The base pair probabilities $P^{bp}$ can be visualized by a dotplot, where larger dots represent higher probabilities.
Analogously to base pair probabilities, we can also compute the probability that a given subsequence $S_{i}..S_{j}$ of an RNA sequence is not involved in any intramolecular base pair.
$P^u$
The probabilities of subsequences to be unpaired $P^{u}$ can be visualized by a dotplot, where larger dots represent higher probabilities.