Freiburg RNA Tools
LocARNA - Help
BIF
IFF

Introduction

LocARNA is a tool for multiple alignment of RNA molecules. It is one of the fastest and most accurate available tools that do a true sequence and structure based comparison of the RNAs. LocARNA requires only RNA sequences as input and will simultaneously fold and align the input sequences. LocARNA outputs a multiple alignment together with a consensus structure. Thus LocARNA is comparable with methods like Dynalign, Foldalign, and Lara. For the folding it makes use of a very realistic energy model for RNAs which is also employed by RNAfold of the Vienna RNA package (or Zuker's mfold). For the alignment it features RIBOSUM-like similarity scoring and realistic gap cost.

The high performance of LocARNA is mainly achieved by employing base pair probabilities during the alignment procedure. This allows to sort out insignificant base pairs from the start. Since the base pair probabilities stem from a full energy model, LocARNA maintains the high accuracy of Sankoff-like RNA comparison methods.

The LocARNA software is available for download as part of the LocARNA package (GPL 3). In contrast to the server, the stand-alone software does not limit the problem size, provides enhanced functionality, and offers a batch processing-friendly command line interface.

To get a first idea and an impression of LocARNA's abilities, please check out our LocARNA at RNA-tools Freiburg tutorial video (older webserver version):

When using LocARNA please cite :

Results are computed with LocARNA version 1.9.1 linking Vienna RNA package 2.3.2

Overview

The following parameters are used to control the execution of LocARNA

Furthermore, additional information is available

Sequence and Structure

?  Sequence Input in FASTA Format

LocARNA accepts input in form of a multiple FASTA file. An example looks like this:
	
>fruA
CCUCGAGGGGAACCCGAAAGGGACCCGAGAGG

>fdhA
CGCCACCCUGCAACCCAAUAUAAAAUAAUACAAGGGAGCAGGUGGCG

>vhuU
AGCUCACAACCGAACCCAUUUGGGAGGUUGUGAGCU

>hdrA
GGCACCACUCGAAGGCUAAGCCAAAGUGGUGCU

Input can be given either as direct text input or uploading a file.
Note: Input size is limited to restrict computation time and memory requirements.

Anchor and Structure Constraints

Along with the input sequences, one can specify constraints on the alignment, including anchor constraints as well as structure constraints.

The structure constraints (lines #S in the following example) inherit their semantics from RNAfold: In consequence, the alignment can only be guided by base pair matches that are compatible to the given constraints. Structure constraints can be specified for individual sequences.
	
>fruA
CCUCGAGGGGAACCCGAAAGGGACCCGAGAGG
.......(((..(((xxxx))).)))...... #S

>fdhA
CGCCACCCUGCGAACCCAAUAUAAAAUAAUACAAGGGAGCAGGUGGCG
..............(((.....xxxxxx......)))........... #S

Note that for input RNAs with structure constraints (#S or #FS), we automatically fall back to non-local folding (unless the user chooses to ignore all constraints).

The anchor constraints are specified by giving unique names to certain sequence positions, here A1,A2,A3,A4,A5,A6,B1,B2,B3,C1,C2,C3,C4 (lines #1,#2). Positions of the same name in different sequences are aligned. In each sequence, names have to be unique. Note: anchor constraint lines have to be specified for none or all sequences, but they can be empty (only dots) for individual sequences!
	
>fruA
CCUCGAGGGGAACCCGAAAGGGACCCGAGAGG
.........AAAAAA.BBBCCCC......... #1
.........123456.1231234......... #2

>fdhA
CGCCACCCUGCGAACCCAAUAUAAAAUAAUACAAGGGAGCAGGUGGCG
...........AAAAAA.....BBB.........CCCC.......... #1
...........123456.....123.........1234.......... #2

Fixed Structures

Instead of structure constraints (lines #S) you can also specify fixed structures using lines #FS as follows:
	
>fruA
CCUCGAGGGGAACCCGAAAGGGACCCGAGAGG
((((..(((...(((....))).)))..)))) #FS

>fdhA
CGCCACCCUGCGAACCCAAUAUAAAAUAAUACAAGGGAGCAGGUGGCG
(((((((.(((...(((.................))).)))))))))) #FS

Whereas structure constraints (#S) only specify parts of the structure and are used to create dot plots representing the ensemble of all structures being compatible with the constraints, fixed structures (#FS) force the ensemble considered for this sequence to contain only this one, fixed structure.
The #FS string can contain pseudoknots; for this purpose, LocARNA supports various bracket symbols: (),[],{},Aa,Bb,Cc,Dd. Sequences without any given structure, sequences with structure constraints (#S) and sequences with fixed structures (#FS) can be mixed freely.
Note that for input RNAs with structure constraints (#S or #FS), we automatically fall back to non-local folding (unless the user chooses to ignore all constraints).
The parameter constraints are: The input has to be in valid FASTA format. The number of sequences has to be at least 2 and at most 30. Sequence lengths have to be in the range 5-2000. The allowed sequence alphabet is 'ACGUTNacgutn'. Fixed structure can be given in a single line with tailing '#FS' using the brace pairs ()[]{}AaBbCcDd. Structure constraints can be given in a single line with tailing '#S' using the alphabet ().x|<>. Anchor constraints can be given. In LocARNA-P (probabilistic) mode, the maximal sequence length is 1000nt.
Defaults to ()

Alignment

?  Alignment Type

Perform either global or local alignment. In global mode, the whole input sequences are aligned, whereas local alignment will determine the best matching subsequences of the input sequences.

?  Alignment Mode

In standard mode, the server runs LocARNA's core engine. For most needs, this mode provides a good balance of alignment accuracy and efficiency.
In LocARNA-P mode, the server delegates the alignment to the tool LocARNA-P. The latter annotates the alignment with a reliability profile; this versatile information enables you to analyze the local alignment quality, revealing conserved sequence and structure signals in your sequences. Furthermore, employing consistency transformation, LocARNA-P usually bests LocARNA's alignment accuracy.
The parameter constraints are: Input value has to be parsable as Boolean.
Defaults to ( Standard)

Alignment Scoring

?  Structure Weight

Weights structural match against sequence match and gap cost. A structural match of two arcs is assigned a score of at most 2. The default structure weight of 200 turned out to balance well the score contributions of structure match and sequence alignment.
The parameter constraints are: Input value has to be parsable as Integer. The value must be greater than or equal to 0 and must be smaller than or equal to 2000.
Defaults to (200)

?  Indel Opening Score

Score for starting a gap in the alignment. This score is a penalty and therefore should be negative.
The parameter constraints are: Input value has to be parsable as Integer. The value must be smaller than or equal to 0.
Defaults to (-800)

?  Indel Score

Cost of extending an alignment gap.
The parameter constraints are: Input value has to be parsable as Integer. The value must be smaller than or equal to 0.
Defaults to (-50)

?  Use RIBOSUM

Whether or not the RIBOSUM matrix 'RIBOSUM85_60' is to be used for scoring sequence match/mismatch. RIBOSUM scoring is the default for LocARNA. If one disables the RIBOSUM matrix use, sequence matchs/mismatchs are scored as given explicitely by parameters 'Match Score' and 'Mismatch Score'.
The parameter constraints are: Input value has to be parsable as Boolean.
Defaults to (true)

?  Match Score

Score for aligning two identical nucleotides.
The parameter constraints are: Input value has to be parsable as Integer. The value must be greater than or equal to 0.
Defaults to (50)

?  Mismatch Score

Score for aligning two different nucleotides.
The parameter constraints are: Input value has to be parsable as Integer. The value must be greater than or equal to 0.
Defaults to (0)

RNA Folding

?  Local folding: max. base pair span

If set, local folding will be applied which limits the maximal span width of a base pair to the given number of nucleotides (nt) (see '-L' in RNAplfold manual).
Local folding is key to reasonable folding results when facing larger RNA molecules, since it minimizes effects of incorrect long-range predictions (see local folding article).
Note, setting the span width automatically adjusts the folding window size (see '-W' in RNAplfold manual). to 2*span!
Note further that for input RNAs with structure constraints (#S or #FS), we automatically fall back to non-local folding (unless the user chooses to ignore all constraints).
The parameter constraints are: Input value has to be parsable as Integer. The value must be greater than 2.
Defaults to (150)

?  Temperature in °C

Rescales the RNA folding energy parameters to the given temperature in degree Celsius (°C) (see '-T' in RNAfold manual).
The parameter constraints are: Input value has to be parsable as Double. The value must be greater than 0 and must be smaller than 100.
Defaults to (37)

?  Energy parameter set (Vienna package)

Defines what energy parameter set to be used within the Vienna RNA package to compute base pair probabilities, i.e. dot plots, for each input sequence. The parameter sets are provided by the Vienna RNA package (see version information) and are named according to the first author and year of the corresponding publication.

?  Upload dot plots

By default, LocARNA creates a dot plot (base pair probability matrix) for each sequence of the fasta input. The dot plot is generated using the tool RNAfold, unless you specify a fixed structure. Alternatively, you can upload custom dot plots; these dot plots are specified in the Vienna RNA dot plot format as it is generated by RNAfold (post script, .ps, please see RNAfold man page). To specify the dot plot of a particular sequence in the fasta input, the sequence in the uploaded file has to exactly match that sequence in the fasta input; file names and the order of uploads are not relevant there. It is possible to upload dot plots for only some of the sequences; then, LocARNA will still compute dot plots for the remaining sequences.

Heuristics for speed/accuracy tradeoff

?  Minimal Pair Probability

Only base pairs that have at least the minimal pair probability are considered for scoring the alignment. Base pairs with lower probability are considered insignificant.
The parameter constraints are: Input value has to be parsable as Double. The value must be greater than 2.0E-4 and must be smaller than 1.
Defaults to (0.0005)

?  Minimal Probability for Guide Tree Construction

Same as minimal pair probability but only used for alignments during the construction phase of the guide tree. Since those alignments are less important than alignment during the progressive alignment phase, one can use higher values here than for "Minimal Pair Probability".
The parameter constraints are: Input value has to be parsable as Double. The value must be greater than 0 and must be smaller than 1.
Defaults to (0.005)

?  Maximal Difference for Sizes of Matched Arcs

Restrict the length difference of base pairs that can be matched. A value of '-1' disables this restriction.
The parameter constraints are: Input value has to be parsable as Integer. The value must be greater than or equal to -1 and must not be equal to 0.
Defaults to (30)

?  Maximal Difference for Alignment Edges

Restrict the difference of sequence positions that can be matched. A value of '-1' disables this restriction.
The parameter constraints are: Input value has to be parsable as Integer. The value must be greater than or equal to -1 and must not be equal to 0. If not -1, it has to hold (maxSequenceLength > (maxDiff * 2 * (minSequenceLength-1))).
Defaults to (60)

Other Parameters

?  Ignore Constraints

Ignore anchor constraints and structural constraints, if they are specified in the input. Otherwise this option has no effect.
The parameter constraints are: Input value has to be parsable as Boolean.
Defaults to (false)

?  Disallow Lonely Pairs

Forbid the occurence of lonely base pairs in the consensus structure of an alignment.
The parameter constraints are: Input value has to be parsable as Boolean.
Defaults to (true)

?  Use alifold consensus dot plots

Employs RNAalifold -p for generating the consensus dot plot after each progressive alignment step. Otherwise, the consensus dot plot are computed as averages over the input dot plots.
Note: This parameter can not be applied and is automatically disabled if (a) structure constraints (#S or #FS) are provided, (b) folding temperature and energy parameters are changed, or (c) LocARNA-P computations are requested.
The parameter constraints are: Input value has to be parsable as Boolean. Is automatically disabled in case Temperature in °C and Energy parameter set (Vienna package) differ from their standard values. Furthermore, it can not be applied if structure constraints are present or LocARNA-P computations are requested.
Defaults to (false)

?  Consistency Transformation (LocARNA-P only)

Consistency-transformation improves the multiple alignment accuracy by avoiding mistakes during progressive aligment. It will re-estimate match probabilities in the multiple alignment based on pairwise alignment probabilities in sequence triples.
The parameter constraints are: Input value has to be parsable as Boolean.
Defaults to (false)

?  Keep Sequence Input Order

If enabled, the original sequence input order will be preserved within the generated alignment.
The parameter constraints are: Input value has to be parsable as Boolean.
Defaults to (false)

?  Stockholm alignment output

If enabled, the all alignments are also produced in Stockholm format.
The parameter constraints are: Input value has to be parsable as Boolean.
Defaults to (true)

Output Description

Color and structure annotated alignment

The alignment is annotated with its consensus structure, a secondary structure of the alignment, as it is predicted by RNAalifold The consensus structure is printed as a string of dots and brackets on top of the alignment. The string is well-bracketed, such that base pairs in the structure are indicated by corresponding opening and closing brackets. Furthermore, compatible base pairs are colored, where the hue shows the number of different types C-G, G-C, A-U, U-A, G-U or U-G of compatible base pairs in the corresponding columns. In this way the hue shows sequence conservation of the base pair. The saturation decreases with the number of incompatible base pairs. Thus, it indicates the structural conservation of the base pair

The representation was generated by the tool RNAalifold of the Vienna Package.

Furthermore, we provide file downloads of the alignment information in the following formats:

LocARNA-P Reliability Profile (STAR Profile Plot)

Visualization of the local sequence–structure-based alignment reliability (STAR) of the result alignment. The profile consists of the reliabilities for each single alignment column. The dark regions indicate structure reliability, the light regions represent sequence reliability, and the thin line shows the combined column-reliability. The column-wise reliabilities are computed as sum-of-pairs over match probabilities, which are computed by LocARNA-P.

Guide Tree and Sub Alignments

The guide tree reflects the similarity relations between your input sequences in the form of a hierarchical clustering. This tree is computed by LocARNA to guide its multiple alignment. It is constructed by the UPGMA method based on the all-against-all comparison of the sequences by LocARNA; the latter considering sequence and structure similarity. The guide tree view is generated using the Newick Utilities.

RNAalifold consensus structure

The consensus structure of the alignment, as predicted by RNAalifold, is shown in a 2D layout. Base pairs are colored using the same color code as in "Colour and structure annotated alignment", such that hue shows sequence conservation and saturation shows structural conservation. Per alignment position, the set of nucleotides with frequency higher than average are represented using IUPAC ambiguity codes for nucleotides. If gaps are present, lower letters are used. If the gap symbol is most frequent, a gap symbol is shown.

The representation was generated by the tool RNAalifold of the Vienna Package.

Color Legend

alignment legend
Compatible base pairs are colored, where the hue shows the number of different types C-G, G-C, A-U, U-A, G-U or U-G of compatible base pairs in the corresponding columns. In this way the hue shows sequence conservation of the base pair. The saturation decreases with the number of incompatible base pairs. Thus, it indicates the structural conservation of the base pair.

Input Examples

?  snoRNAs with constraints

snoRNAs with constraints
The example's result can be directly accessed here

?  tRNA alignment with fixed structure

tRNA alignment with fixed structure
The example's result can be directly accessed here

?  RNA Boundaries with LocARNA-P

RNA Boundaries with LocARNA-P
The example's result can be directly accessed here

?  tRNA alignment

tRNA alignment
The example's result can be directly accessed here

Frequently Asked Questions

If your question is not listed, please send it to us!

? ClustalW (or my favourite sequence alignment tool) is faster, why should I use LocARNA for aligning RNAs?

Sequence alignment programs like ClustalW, T-Coffee, MUSCLE, or MAFFT compare RNAs only by sequence similarity. However, for many RNAs the structure information is strongly conserved. Thus, sequence similarity can be weak even for closely related RNAs of the same family. Consequently, only sequence-structure alignment programs that consider sequence and structure similarity, such as LocARNA, can reveal their true similarity. Pure sequence alignment programs will usually completely fail if sequence identity drops below 60%. Unfortunately, fully taking structure information into account is computationally expensive. Nevertheless, LocARNA achieves very good performance for this class of algorithms.

? Why should I use LocARNA compared to the RNA alignment program XY?

LocARNA is one of the fastest programs that do true sequence-structure alignment of RNAs and therefore produce highly accurate alignments. The LocARNA server is nice and offers a rich interface. You can even specify anchor constraints and structure constraints for global and local alignment; also probabilistic multiple alignment with alignment reliabilities is unique to LocARNA. Honestly, there are other very good programs out there; try them and compare!

? How does LocARNA achieve its speed and low space consumption? Does this compromise accuracy?

LocARNA uses different heuristics for improving speed and lowering space requirements. Most importantly, it filters base pairs by their probability in the RNA structure ensemble and considers only 'significant' base pairs that pass a certain probability threshold. This reduces complexity from O(n^6) time and O(n^4) space to O(n^4) time and O(n^2) space, respectively. Furthermore, one can control which base pairs are compared at all by setting a maximal length difference. Finally, as reasonable for global alignment, one can control which residues are compared by LocARNA by a maximal difference of sequence positions. All those heuristics are optional and can be controlled in the advanced section of the server. However, the heuristics with default settings were shown to preserve alignment accuracy on the comprehensive Bralibase 2.1 benchmark. It's therefore likely that default settings won't compromise the alignment accuracy for your RNAs.

? I am using LocARNA in local alignment mode; why does LocARNA return an empty multiple alignment (or only a very small one)?

Generally, LocARNA constructs multiple alignments by progressively merging (previously computed) alignments of fewer sequences. In the case of local alignment, LocARNA merges local subalignments from previous progressive alignment steps. In this way, the alignments can become shorter and shorter with every merge, since each progressive step retains only the sufficiently similar parts of the subsequences. Thus, if LocARNA cannot not find sufficiently similar common subsequences in all of your input sequences, it will produce a very small or even empty alignment.

Even in the latter case, LocARNA still computes the relations between all of your sequences in the form of a guide tree; furthermore it computes the alignments of the sequence subsets corresponding to the inner nodes of this tree. In many cases, one is rather interested in those alignments than in one single alignment of all sequences.

? I want to align more sequences than allowed. What can I do?

Often it is a good idea to first look for high sequence similarity in such a set of sequences. If one can identify sequence pairs with e.g. more than 80% or 90% identity, usually it does not pay off to align them based on structure similarity. Please check this by running a multiple sequence alignment first.

If this is the case with your set of sequences, you could reasonably reduce the number of sequences that you feed to our web server by omitting extremely similar sequences.

? I want to align longer sequences than allowed. What can I do?

An idea that might work: split the long sequence into (e.g. three) overlapping "windows", i.e. subsequences, and use the web server.

This is especially useful, if your are aligning short sequences with one (or few) long sequence(s) looking for a conserved motif.

? RNAalifold consensus structure output shows single-letter codes, what is the meaning?

The consensus structure output is produced with RNAalifold's option '--mis', i.e. RNAalifold outputs the "most informative sequence" instead of simple consensus: For each column of the alignment output the set of nucleotides with frequence greater than average in IUPAC notation (see RNAalifold manual page).

This means that the special letters are IUPAC codes for nucleotides. This is especially useful, if your are aligning short sequences with one (or few) long sequence(s) looking for a conserved motif.

List of Changes