Introduction

AntaRNA is an Ant-Colony Optimization based tool which solves the RNA inverse folding problem. It designs RNA sequences which satisfy a set of constraints made by the user. The realized multi-objective optimization allows to introduce structure, sequence and GC-content constraints.

The multi-objective optimization is modeled on different layers within a terrain-graph on which simulated ants explore pathes. During a walk of an ant, it assembles a sequence based on the information that is stored in the graph.

Dependent on the quality of the solution, parts of the graph, which are in compliance with the constraints, get promoted, such that their selection probability increases.

Subsequent ants produce and evaluate further solutions and modify the terrain graph accordingly until a solution sequence was found, which satisfies the user defined constraint set.

When using AntaRNA please cite :

Robert Kleinkauf, Martin Mann and Rolf Backofen
AntaRNA - Ant Colony Based RNA Sequence Design
Bioinformatics, 31(19), pages 3114-3121, 2015.
Robert Kleinkauf, Torsten Houwaart, Rolf Backofen, and Martin Mann
AntaRNA - multi-objective inverse folding of pseudoknot RNA using ant-colony optimization
BMC Bioinformatics, 16(1), pages 1-7, 2015.
Martin Raden, Syed M Ali, Omer S Alkhnbashi, Anke Busch, Fabrizio Costa, Jason A Davis, Florian Eggenhofer, Rick Gelhausen, Jens Georg, Steffen Heyne, Michael Hiller, Kousik Kundu, Robert Kleinkauf, Steffen C Lott, Mostafa M Mohamed, Alexander Mattheis, Milad Miladi, Andreas S Richter, Sebastian Will, Joachim Wolff, Patrick R Wright, and Rolf Backofen
Freiburg RNA tools: a central online resource for RNA-focused research and teaching
Nucleic Acids Research, 46(W1), W25-W29, 2018.

Results are computed with AntaRNA version 1.1.2 using Vienna RNA package 2.4.14 or pKiss 2.2.14

Overview

The following parameters are used to control the execution of AntaRNA

Constraints
Constraint
- Allow GU base pairs
- Allow pseudoknots
Energy
- Folding temperature
AntaRNA
Output
- Number of sequences

Furthermore, additional information is available

Output Description
Input Examples
List of Changes

Constraints

Structure constraint

The RNA secondary structure, you wish to design a sequence for, has to be given in extended bracket notation.
A base pair between bases i and j is represented by a '(' at the ith position and a ')' at position j.
Unpaired bases are represented by dots.

If pseudoknots are to be encoded, you can use the brace pairs (), [], {}, <>.

Example:

The following structure is represented by the string (((.((.(((....))))).))).

RNA structure

Besides the regular dot bracket structure notation, AntaRNA provides the usage of 'fuzzy' structure constraints. Respectively defined regions within a structure are allowed to form any structure which occurs, as long as the occurring structure is only interacting within the defined block of constraint.

Using soft constraint (lower case letters) mode, no structure is enforced within the blocks, whereas in hard constraint (upper case letters), at least one base pair has to form within such a defined block. The blocks are allowed to be defined by regular letters and include : "ABCDEFGHIJKLMNOPQSTUVWXYZabcdefghijklmnopqrstuvwxyz"

The parameter constraints are: Has to be either a balanced nested structure encoded by brackets '()'or a balanced crossing pseudoknot structure using the brace pairs '()[]{}<>'. Base pairs have to have a minimal loop length of 3. Positions that have to be unpaired are encoded by '.'. Implicit structure constraints use the alphabet 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz' where upper/lower case encode hard/soft block constraints, respectively. String length has to be in range (5,300). Maximally 1 line is allowed.
Defaults to ()

Sequence constraint

Sequence constraint using IUPAC ambiguity codes for nucleotides {ACGTURYMKWSBDHVN} with wild-card "N". A detailed list of the codes is given below.

Note, upper case nucleotide codes are enforced in the designed sequences while lower case codes are 'soft' constraints and only penalized if not present in the solution sequences.

IUPAC nucleotide code	Base
A	Adenine
C	Cytosine
G	Guanine
U/T	Uracil / Thymine
R	A or G
Y	C or U/T
S	G or C
W	A or U/T
K	G or U/T
M	A or C
B	C or G or U/T
D	A or G or U/T
H	A or C or U/T
V	A or C or G
N	any base

The sequence constraint can also be used as soft constraint. For this, the user can specify lower case characters a,c,g and u. In those cases the terrain is not pruned as in the cases of upper case letters. This allows some rest probabilities for alternative nucleotides. The 'correct' nucleotide is realized via the sequence distance penalty to the quality score of a solution.

The parameter constraints are: The IUPAC alphabet 'ACGTURYMKWSBDHVN' (upper case) is allowed for explicit constraint specification and only 'acgtu' (lower case) can be used for soft constraint definitions. If provided, it has to have the same length as the structure constraint. Lower/Upper case symbols represent soft/hard sequence constraints, respectively.
Defaults to ()

Targeted GC-content [0..1]

Objective target GC content in [0,1]. This constraint can be extended to uniform or normal GC distributions.

The parameter constraints are: Input value has to be parsable as Double. The value must be greater than 0 and must be smaller than 1.
Defaults to (0.6)

Maximum for uniform distribution sampling

Provides a maximum tGC value [0,1] for the case of uniform GC distribution sampling. The regular tGC value serves as minimum value (tGC < tGCmax).

The parameter constraints are: Input value has to be parsable as Double. The value must be greater than 0 and must be smaller than 1. Either this parameter or Variance for normal distribution sampling can be set, not both at the same time.
Defaults to ()

Variance for normal distribution sampling

Provides a tGC variance (sigma square) for the case of normal distribution sampling. The regular target GC value serves as expectation value (mu).

The parameter constraints are: Input value has to be parsable as Double. The value must be greater than 0 and must be smaller than 1.
Defaults to ()

Constraint

Allow GU base pairs

Whether or not GU base pairs are to be considered as valid base pairs within the structure

The parameter constraints are: Input value has to be parsable as Boolean.
Defaults to (true)

Allow pseudoknots

Switch to pseudoknot based prediction using pKiss. The AntaRNA parameters are fixed to optimized values for pKiss.

The parameter constraints are: Input value has to be parsable as Boolean.
Defaults to (false)

Energy

Folding temperature

Provides the temperature for the folding algorithm in Celsius

The parameter constraints are: Input value has to be parsable as Double. The value must be greater than 0 and must be smaller than 100.
Defaults to (37)

AntaRNA

Max. terrain resets

Amount of maximal terrain resets, until the best solution is retuned as solution.

The parameter constraints are: Input value has to be parsable as Integer. The value must be greater than or equal to 1 and must be smaller than or equal to 20.
Defaults to (5)

Convergence counts

Delimits the convergence count criterion for a reset. If the threshold for this counter is exceeded, the terrain will be rest to its initial state.

The parameter constraints are: Input value has to be parsable as Integer. The value must be greater than or equal to 1 and must be smaller than or equal to 300.
Defaults to (130)

Ant termination criterion

Delimits the amount of internal ants which is allowed to be used until the criterion 'termination convergence' is raised and the program terminates.

The parameter constraints are: Input value has to be parsable as Integer. The value must be greater than or equal to 1 and must be smaller than or equal to 300.
Defaults to (50)

Probability weight alpha

Sets alpha, the probability weight for terrain pheromone influence during the calculation of the edge probabilities.

The parameter constraints are: Input value has to be parsable as Double. The value must be greater than or equal to 0 and must be smaller than or equal to 1.
Defaults to (1)

Probability weight beta

Sets beta, the probability weight for terrain path influence during the calculation of the edge probability.

The parameter constraints are: Input value has to be parsable as Double. The value must be greater than or equal to 0 and must be smaller than or equal to 1.
Defaults to (1)

Pheromone evaporation rate

Pheromone evaporation rate. Determines the rate on how much pheromone information evaporates in each round.

The parameter constraints are: Input value has to be parsable as Double. The value must be greater than or equal to 0 and must be smaller than or equal to 1.
Defaults to (0.2)

Constraint weight structure

Structure constraint quality weighting factor.

The parameter constraints are: Input value has to be parsable as Double. The value must be greater than or equal to 0 and must be smaller than or equal to 10.
Defaults to (0.5)

Constraint weight sequence

Sequence constraint quality weighting factor.

The parameter constraints are: Input value has to be parsable as Double. The value must be greater than or equal to 0 and must be smaller than or equal to 10.
Defaults to (1)

Constraint weight GC-value

GC-value constraint quality weighting factor.

The parameter constraints are: Input value has to be parsable as Double. The value must be greater than or equal to 0 and must be smaller than or equal to 10.
Defaults to (5)

Random seed

Provides a seed value for the used pseudo random number generator.

The parameter constraints are: Input value has to be parsable as Integer. The value must be greater than or equal to 1 and must be smaller than or equal to 99999.
Defaults to ()

Output

Number of sequences

Number of sequences which shall be produced.

The parameter constraints are: Input value has to be parsable as Integer. The value must be greater than or equal to 1 and must be smaller than or equal to 100.
Defaults to (10)

Output Description

dStr: The structural distance is based on the symmetric difference of the base pair sets of the target structure and the structure of the current sequence under investigation. The distance is constraint length normalized. If the lonely base pair flag is used, the treatment of lonely base pairs and 2-lonely base pair stacks excludes the explicit structure constraint during the structure distance calculation but enforces, that the bases at the respective positions potentially could form a base pair. This explains why some mfe structures do not show requested base pairs but have a structural distance of 0.

dSeq: The sequence distance is realized as an edit distance between the constraint and the current sequence. For each time a sequence position does not satisfy the provided constraint, the distance grows. The distance is constraint length normalized.

dGC: The GC distance of a sequence towards its constraint is simply the deviation of the GC content towards the made constraint in its absolute value. For the case, in which a certain sequence length cannot achieve a targeted GC content explicitly, the next two possible GC values of the sequence are made legal GC values within the calculation and thus result in a dGC value of 0.

Input Examples

SECIS block design

Design a SECIS element with flexible stem length (block C) and flanking context (blocks A and B). Note, where upper case symbols are used the according block has to show at least one base pair in the final design. The design is visualized in the following figure.

SECIS design with block constraints

The example's result can be directly accessed here

Crossing pseudoknot

Design sequences for the crossing pseudoknot structures PKB298 with low GC-content.

The example's result can be directly accessed here

Nested tRNA structure

Design sequences for the clover leaf form of a tRNA with high GC-content.

The example's result can be directly accessed here

List of Changes

4.4.2 : compatibility check for Cseq and Cstr within the interface
4.0.0 : AntaRNA v1.1.2 online

Main Menu

Introduction

When using AntaRNA please cite :

Overview

Constraints

Structure constraint

Sequence constraint

Targeted GC-content [0..1]

Maximum for uniform distribution sampling

Variance for normal distribution sampling

Constraint

Allow GU base pairs

Allow pseudoknots

Energy

Folding temperature

AntaRNA

Max. terrain resets

Convergence counts

Ant termination criterion

Probability weight alpha

Probability weight beta

Pheromone evaporation rate

Constraint weight structure

Constraint weight sequence

Constraint weight GC-value

Random seed

Output

Number of sequences

Output Description

Input Examples

SECIS block design

Crossing pseudoknot

Nested tRNA structure

List of Changes