Freiburg RNA Tools
CARNA - Help
BIF
IFF

Introduction

CARNA is a tool for multiple alignment of RNA molecules. CARNA requires only the RNA sequences as input and will compute base pair probability matrices and align the sequences based on their full ensembles of structures. Alternatively, you can also provide base pair probability matrices (dot plots in .ps format) or fixed structures (as annotation in the FASTA alignment) for your sequences. If you provide fixed structures, only those structures and not the entire ensemble of possible structures is aligned. In contrast to LocARNA, CARNA does not pick the most likely consensus structure, but computes the alignment that fits best to all likely structures simultaneously. Hence, CARNA is particularly useful when aligning RNAs like riboswitches, which have more than one stable structure. Also, CARNA is not limited to nested structures, but is able to align arbitrary pseudoknots.

When using CARNA please cite :

Results are computed with CARNA version 1.3.3 linking LocARNA 1.9.1, Gecode 5.0.0, using Vienna RNA package 2.3.2

Overview

The following parameters are used to control the execution of CARNA

Furthermore, additional information is available

Input Parameter

?  Sequence Input in FASTA Format

CARNA accepts input in form of a multiple FASTA file. A simple example looks like this:
>fruA
CCUCGAGGGGAACCCGAAAGGGACCCGAGAGG
>fdhA
CGCCACCCUGCAACCCAAUAUAAAAUAAUACAAGGGAGCAGGUGGCG
>vhuU
AGCUCACAACCGAACCCAUUUGGGAGGUUGUGAGCU
>hdrA
GGCACCACUCGAAGGCUAAGCCAAAGUGGUGCU
Input can be given either as direct text input or by uploading a file.

Since CARNA is tailored for sequence-structure alignment, additional structure information can be provided by the user. To this end, an extended FASTA format is used as presented in the following. Most important, all additional lines within the FASTA file have to be tagged accordingly with a tailing '#TAG' information in order to enable the correct parsing of the user input. In the following, possible information adds and the appropriate encoding is presented.

Structure and Anchor Constraints

Along with the input sequences, one can specify constraints on the alignment, including structure constraints as well as anchor constraints. Constraints are specified in the input in the following example.
>fruA
CCUCGAGGGGAACCCGAAAGGGACCCGAGAGG
.......(((..(((xxxx))).)))...... #S
.........AAAAAA.BBBCCCC......... #1
.........123456.1231234......... #2
>fdhA
CGCCACCCUGCGAACCCAAUAUAAAAUAAUACAAGGGAGCAGGUGGCG
..............(((.....xxxxxx......)))........... #S
...........AAAAAA.....BBB.........CCCC.......... #1
...........123456.....123.........1234.......... #2
Note that the line endings (#S,#1,#2,...) are part of the input and mark extensions of the standard FASTA format.

The structure constraints (lines ending in '#S') inherit their semantics from the tool RNAfold from the Vienna RNA package: With the exception of "|", constraints will disallow all pairs conflicting with the constraint. This is usually sufficient to enforce the constraint, but occasionally a base may stay unpaired in spite of constraints. PF folding ignores constraints of type "|".

These well-bracketed strings of the same length as the corresponding sequence, restrict the set of structures in the ensemble.
For example, the line .......(((..(((xxxx))).)))...... #S specifies that all structures in the ensemble allow base pairs between the positions of corresponding opening and closing brackets and that positions "x" are unpaired. The following symbols are available:
. - no constraint for this base
x - the base is unpaired
< - base i is paired with a base j>i
> - base i is paired with a base j<i
()- matching brackets; base i pairs base j
The anchor constraints (#1/#2 lines) are specified by giving unique names to certain sequence positions, here A1,A2,A3,A4,A5,A6,B1,B2,B3,C1,C2,C3,C4 (lines #1,#2). Positions of the same name in different sequences are aligned. The encoding of the positions is split into two lines ('#1' and '#2') where line '#1' gives the letter encoding for each subsequence (here A,B,C) while line '#2' assigns the according identifier numbers to each position [limited to 0-9]. In each sequence, names have to be unique.

Fixed Structures

Instead of structure constraints (lines ending in '#S') you can also specify fixed structures using lines ending in '#FS' as follows:
>fruA
CCUCGAGGGGAACCCGAAAGGGACCCGAGAGG
((((..(((...(((....))).)))..)))) #FS
>fdhA
CGCCACCCUGCGAACCCAAUAUAAAAUAAUACAAGGGAGCAGGUGGCG
(((((((.(((...(((.................))).)))))))))) #FS
Whereas structure constraints (#S) only specify parts of the structure and are used to create dot plots representing the ensemble of all structures being compatible with the constraints, fixed structures (#FS) force the ensemble considered for this sequence to contain only this one, fixed structure by generating an dot plot that contains probability one for each specified base pair and zero for all others.
The #FS string can contain pseudoknots; for this purpose, CARNA supports various bracket symbols: (),[],{},Aa,Bb,Cc,Dd. Sequences without any given structure, sequences with structure constraints (#S) and sequences with fixed structures (#FS) can be mixed freely.
The parameter constraints are: The input has to be in valid FASTA format. The number of sequences has to be at least 2 and at most 30. Sequence lengths have to be in the range 5-2000. The allowed sequence alphabet is 'ACGUTNacgutn'. Fixed structure can be given in a single line with tailing '#FS' using the brace pairs ()[]{}AaBbCcDd. Structure constraints can be given in a single line with tailing '#S' using the alphabet ().x|<>. Anchor constraints can be given. In NUPACK (pseudoknot) mode, sequence lengths have to be at most 120 due to the NUPACK computations.

?  Upload dot plots

By default, CARNA creates a dot plot (base pair probability matrix) for each sequence of the FASTA input. Usually, dot plots are generated using the tools RNAfold or NUPACK, unless one specifies a fixed structure (an exception are fixed structures, which are directly translated to dot plots, see the Fixed Structures section for details). Another option is to upload custom dot plots for the sequences.

?  Predict dot plots

The server supports two algorithms to predict dot plots automatically from the sequence. Both use a complex thermodynamic energy model for RNA. In the first variant, the server predicts dot plots without pseudoknots by RNAfold. This is the server's default, since calculating pseudoknot-free dot plots is fast and sufficient in many cases. However, using pseudoknot-free dot plots, CARNA will not be able to predict pseudoknots or improve their alignment over, e.g., LocARNA. If this is needed, one can provide pseudoknotted fixed structures, custom dot plots, or let the server predict dot plots with pseudoknots. For the latter, dot plots are generated by the tool /pairs/ of NUPACK. This program predicts dot plots using an algorithm of Dirks and Pierce that pseudoknots of specifically limited complexity. Please note that, whereas CARNA can align arbitrarily complex pseudoknots that are specified in the input dot plots, predicting dot plots with arbitrarily complex pseudoknots is computationally infeasible. Due to a limitation of NUPACK, the prediction of dot plots with pseudoknots under structure constraints is not supported.

Custom dot plots

Custom dot plots are specified in the Vienna RNA dot plot format as it is generated by RNAfold (post script, .ps, please see RNAfold man page). To specify the dot plot of a particular sequence in the FASTA input, the sequence in the uploaded file has to exactly match that sequence in the FASTA input; file names and the order of uploads are not relevant there. It is possible to upload dot plots for only some of the sequences; then, CARNA will still compute dot plots for the remaining sequences.

Scoring Parameters

?  Structure Weight

Weights structural match against sequence match and gap cost. A structural match of two arcs is assigned a score of at most 2. The default structure weight of 200 turned out to balance well the score contributions of structure match and sequence alignment.
The parameter constraints are: Input value has to be parsable as a Integer. The value must be greater than or equal to 0.

?  Indel Opening Score

Score for starting a gap in the alignment. This score is a penalty and therefore should be negative.
The parameter constraints are: Input value has to be parsable as a Integer. The value must be smaller than 0.

?  Indel Score

Cost of extending an alignment gap.
The parameter constraints are: Input value has to be parsable as a Integer. The value must be smaller than 0.

?  Use RIBOSUM

Whether or not the RIBOSUM matrix 'RIBOSUM85_60' is to be used for scoring sequence match/mismatch. RIBOSUM scoring is the default for CARNA. If one disables the RIBOSUM matrix use, sequence matchs/mismatchs are scored as given explicitely by parameters 'Match Score' and 'Mismatch Score'.
The parameter constraints are: Input value has to be parsable as a Boolean.

?  Match Score

Score for aligning two identical nucleotides.
The parameter constraints are: Input value has to be parsable as a Integer. The value must be greater than or equal to 0.

?  Mismatch Score

Score for aligning two different nucleotides.
The parameter constraints are: Input value has to be parsable as a Integer.

Heuristics for speed/accuracy tradeoff

?  Minimal Pair Probability

Only base pairs that have at least the minimal pair probability are considered for scoring the alignment. Base pairs with lower probability are considered insignificant.
The parameter constraints are: Input value has to be parsable as a Double. The value must be greater than 0.

?  Maximal Difference for Sizes of Matched Arcs

Restrict the length difference of base pairs that can be matched.
The parameter constraints are: Input value has to be parsable as a Integer. The value must be greater than 0.

?  Maximal Difference for Alignment Edges

Restrict the difference of sequence positions that can be matched.
The parameter constraints are: Input value has to be parsable as a Integer. The value must be greater than 0.

Other Parameter

?  Ignore Constraints

Ignore anchor constraints and structural constraints, if they are specified in the input. Otherwise this option has no effect.
The parameter constraints are: Input value has to be parsable as a Boolean.

?  Search Time Limit (in milliseconds)

Restrict the search time for each pairwise alignment (in the course of the multiple alignment construction) by a time limit in milliseconds. If the time limit is exceeded, CARNA returns the best alignment found so far.
The parameter constraints are: Input value has to be parsable as a Integer. The value must be greater than 0.

?  Disallow Lonely Pairs

Forbid the occurence of isolated base pairs in the ensemble of each individual RNA. This option affects only the structure ensemble prediction.
The parameter constraints are: Input value has to be parsable as a Boolean.

Output Description

Conservation Dot Plots

We show conservation dot plots annotated with an arc representation of the most probable base pairs of the consensus dot plot on the right. Radio buttons on the bottom of the figures allow to switch between different dot plots and settings. For the arc representation we allow different threshold on the probability of shown base pairs.

The consensus conservation dot plot (radio button "Consensus(average)") averages the input dot plots according to the alignment. The sequence shown in the consensus dot plot is a simple majority consensus sequence. This dot plot shows two copies of the averaged dot plot, one in the upper right triangle and one in the lower left triangle. The plot of the lower left triangle is annotated with the color-encoded conservation information of each base pair, resulting in a conservation consensus dot plot. More precisely, the conservation of a consensus base pair is measured as “inverse deviation” 1−2sd, where sd is the standard deviation of the base pair’s probability across all sequences in the alignment. In this way, an inverse deviation of one corresponds to perfect conservation, whereas zero corresponds to maximum variance. The color encoding is shown in the legend below.

The other radio buttons show conservation dot plots for each single RNA. For these dot plots, we project the input dot plots to the alignment and complement them with consensus and conservation information in the lower left triangle. Whereas the upper right triangle shows the probabilities of base pairs in the single sequence, the lower left triangle shows the corresponding averaged probabilities. In the upper right triangle, the user can optionally highlight all base pairs that are highly probable in the consensus setting a threshold probability (radio buttons "highlight average probabilities >=" at the bottom of the plot).

Color Legend

The lower left triangle of the dot plots contains the average dot plot colored with variance information. Pure green means maximum variance (e.g. in half of the sequences the dot has probability 0 and in the other half it has probability 1); pure red means no variance at all (the dot has the same probability in all sequences).

Rainbow color legend

Alignment annotated with pseudoknot-free consensus structure

The alignment is annotated with its (pseudoknot-free) consensus structure. This "secondary structure of the alignment" is predicted by the tool RNAalifold. Due to the use of RNAalifold, this structure does not contain pseudoknots even when pseudoknots are specified and are correctly aligned by CARNA. Pseudoknots are best visualized in the provided dot plot representations. The consensus structure is printed as a string of dots and brackets on top of the alignment. The string is well-bracketed, such that base pairs in the structure are indicated by corresponding opening and closing parentheses. Furthermore, compatible base pairs are colored. The hue encodes the number of different types C-G, G-C, A-U, U-A, G-U or U-G of compatible base pairs in the corresponding columns. In this way, the hue indicates confirmation of the structure by compensatory mutations. The saturation decreases with the number of incompatible base pairs. Thus, it indicates the structural conservation of the base pair.

The representation was generated by the tool RNAalifold from the Vienna RNA package.

Color Legend

alignment legend
Compatible base pairs are colored, where the hue shows the number of different types C-G, G-C, A-U, U-A, G-U or U-G of compatible base pairs in the corresponding columns. In this way the hue shows sequence conservation of the base pair. The saturation decreases with the number of incompatible base pairs. Thus, it indicates the structural conservation of the base pair.

Input Examples

?  fixed pseudoknot structures

This example demonstrates CARNA's capability to align RNA with pseudoknots. In this example, we provide fixed input structures with pseudoknots. Thereby, we demonstrate the syntax of constraint annotation in the fasta file. In the output of this example correct alignment of the pseudoknots is best seen from our conservation consensus dot plot representation. Please note that the consensus structure in the shown alignment (Alignment annotated with pseudoknot-free consensus structure) does not show the pseudoknot because this consensus structure is generated by RNAalifold from the CARNA alignment. RNAalifold was not designed to predict pseudoknots. Since we provide fixed structures in this example, it runs with default settings. To predict pseudoknots from ensembles, one has to explicitly predict the ensemble dot plots with pseudoknots. This is supported via a tool from NUPACK. Due to the hardness of pseudoknot folding, this will work for only comparably simple pseudoknots as described by Dirks and Pierce (J Comput Chem, 2004).
The example's result can be directly accessed here

?  CARNA_4329548

HDV RF00094 pseudoknots
The example's result can be directly accessed here

?  tRNA alignment

The purpose of this exampe of 5 tRNAs is to demonstrate CARNA's ability to align RNA without special properties like pseudoknots or multiple conserved structures based on their structure ensemble. Furthermore, it demonstrates the visualization of the alignment by a well known example. The visualization with conservation dot plots provides additional information over the output of general-purpose RNA alignment tools like LocARNA.
The example's result can be directly accessed here

?  multiple conserved structures

In this small example, we align the RNA xbix to three designed variants that fold into the same two conserved structures. xbix was introduced as an example for multiple metastable structures by Wolfinger et al. (/J.Phys.A: Math.Gen./, 2004). It is instructive to compare the alignments of these sequences by CARNA and LocARNA. Whereas CARNA's alignment preserves both conserved structures in the consensus ensemble, LocARNA aligns only one of the two structures correctly and misaligns the other. The example works with the default settings of the server, i.e. dot plots of the ensembles are predicted without pseudoknots by RNAfold. For further illustration, we list the sequences with their conserved structures. xbixA is the original example from Wolfinger et al.

>xbixA
CUGCGGCUUUGGCUCUAGCC
....((((........))))
(((.(((....))).)))..
>xbixB
CAUACCCAAUACGGGAUGGG
....((((........))))
(((.(((.....))))))..
>xbixC
GUGCGCGUUAUUCGUCUACGC
....((((.........))))
(((.(((.....))).)))..
>xbixD
GGGCCGGGUUGUUGCUCCCG
....((((........))))
(((.(((....))).)))..

Multiple Conserved Structures Conservation dot plots for xbix variants A-D, the consensus conservation dot plot of CARNA's alignment and the consensus conservation dot plot of the alignment by LocARNA. The LocARNA consensus dot plot shows a misalignment of the inner stem of one of the two conserved structures. Only CARNA can simultaneously align both structures and aligns this stem and all other base pairs correctly. The misalignment by LocARNA is also seen by annotating LocARNA's alignment with the two conserved structures:

>xbixA              
CUGCGGCUUUGGCU-CUAGCC
....((((......-..))))
(((.(((....)))-.)))..
>xbixC
GUGCGCGUUAUUCGUCUACGC
....((((.........))))
(((.(((.....))).)))..
>xbixD
GGGCCGGGUUGUUG-CUCCCG
....((((......-..))))
(((.(((....)))-.)))..
>xbixB
CAUACCCAAUACGGG-AUGGG
....((((.......-.))))
(((.(((.....)))-)))..
The example's result can be directly accessed here

List of Changes