Freiburg RNA Tools
ExpaRNA - Help
BIF
IFF

Introduction

ExpaRNA is a fast, motif-based sequence-structure alignment method. Instead of computing a full sequence-structure alignment, ExpaRNA efficiently computes the best arrangement of sequence-structure motifs common to two RNAs. A motif is a local (or isolated) substructure which is identical to both RNAs. We call such a substructure "exact pattern matching" (EPM).

ExpaRNA uses as a preprocessing step a previously published algorithm from our group, which detects the set of all isolated common substructures for two given RNAs (Backofen and Siebert, Journal of Discrete Algorithms, 2007). Then ExpaRNA computes the longest collinear sequence of substructures common to two RNAs. The results of ExpaRNA are in good agreement with existing alignment-based methods, but can be obtained in a fraction of running time, in particular for larger RNAs.

Moreover, the ExpaRNA approach can be used to speed up state-of-the-art Sankoff-style alignment methods like our LocARNA algorithm and related approaches that are in principle able to profit from alignment constraints. We have setup a pipeline that uses the predicted sequence structure motifs as anchor points for a LocARNA alignment. This amounts to calculate a constraint alignment by LocARNA, which restricts the search space and thus speeds up the alignment significantly and is therefore important for large-scale data analysis. On the other hand, our approach maintains existing structural motifs in the resulting alignment.

View the ExpaRNA tutorial video:

When using ExpaRNA please cite :

Results are computed with ExpaRNA version 1.0.0 linking Vienna RNA package 1.8.5

Overview

The following parameters are used to control the execution of ExpaRNA

Furthermore, additional information is available

Input Parameter

?  Sequence Input in FASTA Format

ExpaRNA accepts input in FASTA-like format. Input can be given either as direct text input or uploading a file.

The parameter constraints are: The input has to be in valid FASTA format. No multiline input is allowed (sequence etc. in one line each). The number of sequences has to be at least 2 and at most 2. Sequence lengths have to be in the range 1-2000. The allowed sequence alphabet is 'ACGTUNacgtun'. Fixed structure can be given in a single line with tailing '#FS' using the brace pairs (). Each sequence is allowed to have at max 0.25*seqLength 'N' characters.
Defaults to ()

Scoring Parameter

?  Minimal size of included substructures

Minimal size (nucleotides) of used substructures (EPMs). Minimum is 2.
The parameter constraints are: Input value has to be parsable as Integer. The value must be greater than or equal to 2.
Defaults to (7)

?  Maximal number of used substructures

All possible substructures (EPMs) are sorted by their size, from large to small. 0 - all EPM are used 1 - the largest EPM is used 2 -...
The parameter constraints are: Input value has to be parsable as Integer. The value must be greater than or equal to 0.
Defaults to (0)

?  EPM scoring

1: EPM score = EPM size in Nucleotides (default)
2: EPM score = (EPM size)^2 (prefers larger patterns in LCS-EPM)

Output Parameter

?  Write ExpaRNA result as input for LocARNA with anchor constraints

This option create a file "LCSEPM_LocARNA_input.fa" which can be used as input LocARNA. The file contains additional anchor constraints for the LocARNA alignment
The parameter constraints are: Input value has to be parsable as Boolean.
Defaults to (true)

?  Write ExpaRNA result as alignment into text file

This option creates a file "LCSEPM_align.aln" with the result written as alignment
The parameter constraints are: Input value has to be parsable as Boolean.
Defaults to (false)

?  Write ExpaRNA results as list in file

This option creates a file "LCSEPM.epm" with all used substructures as list.
The parameter constraints are: Input value has to be parsable as Boolean.
Defaults to (false)

?  Write all EPMs into file

This option creates a file "allEPM.epm" with ALL common subtructures of the input RNAs.
The parameter constraints are: Input value has to be parsable as Boolean.
Defaults to (false)

Output Description

ExpaRNA Output Sample
The upper part of the results shows a summary about the computed optimal set of common substructures. The first row "EPMs found with minimal size (x)" is the overall number of common substructures (EPMs) of size x (or larger) between the two input RNAs. This set includes overlapping and crossing EPMs. The row "Score LCS-EPM" is the score of the optimal set of EPMs as shown in the graphical output. The next two rows give information about how many substructures are used in the final result and how many nucleotides are covered by them.

ExpaRNA Output Sample 2

The graphical output allows to compare the input RNAs. A common substructure is shown in the same color in both structure plots. A click on the images will enlarge both plots for better comparison. Each annotated RNA structure can be downloaded in different file formats (pdf,ps and png).

Input Examples

?  large

Alignment of two long ncRNAs with pre-defined structures.
The example's result can be directly accessed here

?  small

Alignment of two relatively small ncRNAs without pre-defined structure. Here, the minimum free energy structure is computed and used for the alignment.
The example's result can be directly accessed here

Frequently Asked Questions

If your question is not listed, please send it to us!

? Can ExpaRNA compare multiple RNAs?

Currently, only pairwise comparison is supported. Maybe in future a multiple comparison is possible.

? What is the difference between the scoring 'default' and 'prefer larger substructures'?

With the default scheme (t=1) ExpaRNA tries to find a set of common substructures with maximal number of nucleotides. With the alternative scheme (t=2), the score of each substructure is quadratic to its size in nucleotides. For example, a base pair (2 nucleoides) has score 4, two base pairs have score 16 and so on. This prefers larger substructures in the result.

? Can ExpaRNA find substructures with mismatches?

No, ExpaRNA only finds exact matching substructures. Both sequence and structure of all nucleotides have to be exactly the same. If you are interested in a pure structural comparison you can change all nucleotides to 'N' and use the structure of the original sequence.

? Can ExpaRNA do a structural comparison only?

Yes, to some extend. As a trick you can change all nucleotides to 'N' in combination with the structure of the original sequence.

List of Changes