# Teaching - McCaskill : structure probabilities

John S. McCaskill (1990)
introduced an efficient dynamic programming algorithm to compute the
partition function $Z=\sum_{P} \exp(-E(P)/RT)$ over all possible nested structures $P$
that can be formed by a given RNA sequence $S$ with $E(P)$ = energy of structure $P$,
$R$ = gas constant, and $T$ = temperature.

Here, we provide a simplified version of the approach using a Nussinov-like energy scoring scheme, i.e. each base pair of a structure contributes a fixed energy term $E_{bp}$ independent of its context. Given this, we populate two dynamic programming tables $Q$ and $Q^{bp}$. $Q_{i,j}$ provides the partition function for subsequence from position $i$ to $j$, while $Q^{bp}_{i,j}$ holds the partition function of the subsequence given the constraint that position $i$ and $j$ form a base pair (or 0 if no base pair is possible). Watson-Crick as well as GU base pairs are considered complementary. The overall partition function is given by $Z = Q_{1,n}$ for a sequence of length $n$.

Given these partition function terms, we can compute base pair probabilities $P^{bp}$ as well as probabilities that a certain subsequence is unpaired $P^{u}$, i.e. its positions are not involved in any base pair. Such probabilities can be visualized using a dotplot. In this matrix-like representation, each value of a cell $i,j$ is represented by an according dot where its size is in relation to the value.

Here, we provide a simplified version of the approach using a Nussinov-like energy scoring scheme, i.e. each base pair of a structure contributes a fixed energy term $E_{bp}$ independent of its context. Given this, we populate two dynamic programming tables $Q$ and $Q^{bp}$. $Q_{i,j}$ provides the partition function for subsequence from position $i$ to $j$, while $Q^{bp}_{i,j}$ holds the partition function of the subsequence given the constraint that position $i$ and $j$ form a base pair (or 0 if no base pair is possible). Watson-Crick as well as GU base pairs are considered complementary. The overall partition function is given by $Z = Q_{1,n}$ for a sequence of length $n$.

Given these partition function terms, we can compute base pair probabilities $P^{bp}$ as well as probabilities that a certain subsequence is unpaired $P^{u}$, i.e. its positions are not involved in any base pair. Such probabilities can be visualized using a dotplot. In this matrix-like representation, each value of a cell $i,j$ is represented by an according dot where its size is in relation to the value.

RNA sequence $S$:

Minimal loop length $l$:

Energy weight of base pair $E_{bp}$:

'Normalized' temperature $RT$:

## Partition functions

The following recursions are used to compute the partition functions $Q$ and $Q^{bp}$.

$Q$ | ||
---|---|---|

$Q^{bp}$ | ||
---|---|---|

## Base pair probabilities

Given the partition functions $Q$ and $Q^{bp}$ we can now compute the
probabilities of individual base pairs $(i,j)$ within the structure
ensemble, i.e. $P^{bp}_{i,j} = \sum_{P \ni (i,j)} \exp(-E(P)/RT) / Z$ given by
the sum of the Boltzmann probabilities of all structures that contain the
base pair. For its computation, the following recursion is used, which
covers both the case that $(i,j)$ is an external base pair as well
as that $(i,j)$ is directly enclosed by an outer base pair $(p,q)$.

Base pair probabilities can be used for structure prediction using e.g. a maximum expected accuracy (MEA) approach.

Base pair probabilities can be used for structure prediction using e.g. a maximum expected accuracy (MEA) approach.

$P^{bp}$ | ||
---|---|---|

The base pair probabilities can be visualized by a dotplot, where
larger dots represent higher probabilities.

## Unpaired probabilities / accessibility

Analogously to base pair probabilities, we can also compute
the probability that a given subsequence $S_{i}..S_{j}$ of an RNA sequence
is

Unpaired probabilities can be interpreted as the accessibility of a given region of an RNA to further structure formation. Thus, it can be used to enhance RNA-RNA interaction prediction approaches, e.g. see our accessibility-based prediction implementation.

*not*involved in any intramolecular base pair.Unpaired probabilities can be interpreted as the accessibility of a given region of an RNA to further structure formation. Thus, it can be used to enhance RNA-RNA interaction prediction approaches, e.g. see our accessibility-based prediction implementation.

$P^u$ | ||
---|---|---|

The probabilities of subsequences to be unpaired can be visualized by a dotplot, where
larger dots represent higher probabilities.