Freiburg RNA Tools
IntaRNA - Help
BIF
IFF

Introduction

IntaRNA is a program for the fast and accurate prediction of interactions between two RNA molecules. It has been designed to predict mRNA target sites for given non-coding RNAs (ncRNAs) like eukaryotic microRNAs (miRNAs) or bacterial small RNAs (sRNAs), but it can also be used to predict other types of RNA-RNA interactions.

IntaRNA predicts RNA-RNA interactions by an energy-based approach that is based on two assumptions: (1) the accessibility of the interaction sites is important for the interaction formation, and (2) a seed region is required to initiate the interaction (e.g. the 5' seed region for miRNAs). In two previous studies on interactions between bacterial sRNAs and their target mRNAs, we presented evidence that the incorporation of these two requirements improves the prediction quality of IntaRNA (Busch et al., 2008; Richter et al., 2010).

The energy score of a predicted RNA-RNA interaction is the sum of the following contributions:

Interactions are predicted by minimizing the interaction energy score via dynamic programming.

The free energy that is required to unfold the interaction site, i.e. making it accessible, is calculated from the thermodynamic ensemble of all secondary structures that can be formed by the RNA sequence. Ensemble free energy calculation is realized via the Vienna RNA library (Hofacker et al., 1994; Lorenz et al., 2011).

For the interaction between the two RNAs, IntaRNA requires the existence of an interaction seed. A seed is an initial interacting region of (nearly) perfect sequence complementarity, which is additionally often conserved. The user has to specify the minimal number of perfectly paired bases and the maximal number of unpaired bases in the seed region. Other seed features as the seed position in the ncRNA can be optionally defined by the user. Precomputed results for Enterobacteria: ChiX, CyaR, FnrS, GcvB, MicC, RyhB, Spot42, and for Non-enteric bacteria: LhrA2, PrrF1, Yfr1.
Furthermore, IsaR target prediction on the PCC6803 genome.

Note, in contrast to this server, the stand-alone IntaRNA software does not limit the problem size, provides enhanced functionality, and offers a batch processing-friendly command line interface. For this reasons, you might consider to install IntaRNA locally

When using IntaRNA please cite :

Results are computed with IntaRNA version 2.0.5 (wrapper 1.1.2) linking Vienna RNA package 2.3.1

Overview

The following parameters are used to control the execution of IntaRNA

Furthermore, additional information is available

Sequence Parameters

?  Query ncRNA (short) in FASTA

The query ncRNA is in general supposed to be the shorter of both interacting RNAs. Note, no overlaps of reported interactions are allowed for the target RNAs (but within the query RNAs). IntaRNA accepts the input of query sequences in form of a multiple FASTA file. Input can be given either as direct text input or by uploading a file. Each sequence should contain only the characters A, C, G, T, U.
A sequence in FASTA format begins with a single-line sequence identifier that starts with a greater-than (">") symbol, followed by lines of sequence data. For readability, it is recommended that each line is at most 80 characters in length. Ambiguous nucleotide(s) 'N' are excluded from all base pairings of the interaction.
The parameter constraints are: The input has to be in valid FASTA format. The number of sequences has to be at least 1 and at most 100. Sequence lengths have to be in the range 7-2000. The allowed sequence alphabet is 'ACGUTNacgutn'. In NCBI target mode, only 1 query sequence with length up to 750nt is allowed.

?  Target RNA (long) in FASTA

The target RNA is in general supposed to be the longer of both interacting RNAs. Note, no overlaps of reported interactions are allowed for the target RNAs (but within the query RNAs). IntaRNA accepts the input of target sequences in form of a multiple FASTA file. Input can be given either as direct text input or by uploading a file. Each sequence should contain only the characters A, C, G, T, U.
A sequence in FASTA format begins with a single-line sequence identifier that starts with a greater-than (">") symbol, followed by lines of sequence data. For readability, it is recommended that each line is at most 80 characters in length. Ambiguous nucleotide(s) 'N' are excluded from all base pairings of the interaction.
The parameter constraints are: The input has to be in valid FASTA format. The number of sequences has to be at least 0 and at most 100. Sequence lengths have to be in the range 7-2000. The allowed sequence alphabet is 'ACGUTNacgutn'. The minimal sequence number is 1.

?  Target NCBI RefSeq ID

For prokaryotes, mRNA sequences can be automatically extracted from a genome instead of manual sequence input. For a given NCBI RefSeq genome accession number, subsequences of each gene annotated in the genome are extracted. The accession number has to start with "NC_" followed by six digits or "NZ_" followed by some string, which refers to complete genomic molecules including genomes, chromosomes, and plasmids.

Each extracted subsequence consists of a user-specified number of nucleotides upstream and downstream of the start codon or the stop codon. Downstream positions start at start codon or after stop codon, respectively.

To check if the organisms you selected compatible, check this list of RefSeq IDs. Currently, more than 2400 organisms are compatible. The list is regularly updated.

Please contact us if you know your organism is part of the RefSeq database and has an ID in the NZ_* or NC_XXXXXX format but is not present in this list, or is missing IDs. Then we can run an update.
The parameter constraints are: Access to the NCBI server is needed. Either this parameter or Target RNA (long) in FASTA can be set, not both at the same time. The given ID has to be in the format 'NC_XXXXXX' (where X is a digit) or 'NZ_*' (where * is a string) and to be within the compatible list of RefSeq IDs (see help).

?  All replicons

Specifies whether or not IntaRNA should be executed on all replicons for the given NCBI RefSeq ID.
The parameter constraints are: Input value has to be parsable as a Integer. The value must be greater than or equal to 0 and must be smaller than or equal to 1.

?  Extract sequences around

This option allows you to select from which region of the mRNAs you would like to retrieve your putative target sequences. Selecting "start codon" selects regions upstream and downstram (see nt up, nt down) relative to the start codon. The same logic holds if you select "stop codon".

?  nt up (1-300)

This parameter specifies the number of nucleotides (nt) upstream of your start or stop codon (depending which one you selected). If you selected start codon, and have prior knowledge about average 5'UTR lengths in your input organisms then it is sensible to set nt up to this number in order to increase prediction quality. The sum of nt up and nt down must be at least equal to the window size.
The parameter constraints are: Input value has to be parsable as a Integer. The value must be greater than or equal to 1 and must be smaller than or equal to 300.

?  nt down (1-300)

This parameter specifies the number of nucleotides (nt) downstream of your start or stop codon (depending which one you selected). If you selected start codon, and have prior knowledge about average 3'UTR lengths in your input organisms then it is sensible to set nt up to this number in order to increase prediction quality. The sum of nt up and nt down must be at least equal to the window size.
The parameter constraints are: Input value has to be parsable as a Integer. The value must be greater than or equal to 1 and must be smaller than or equal to 300.

Output Parameters

?  Number of (sub)optimal interactions

Maximal number of (sub)optimal interactions that are predicted. Select overlapping constraints for suboptimal interactions.
The parameter constraints are: Input value has to be parsable as a Integer. The value must be greater than or equal to 1 and must be smaller than or equal to 100. In NCBI mode, this parameter has to be '1.0'.

?  Suboptimal interaction overlap

Defines where overlapping interactions in the output are allowed: N - no overlaps at all, T - overlaps in target only, Q - overlaps in query only, B - overlaps in both query and target.

?  Max. absolute energy of an interaction

Output only interactions with an energy below or equal to this energy in kcal/mol. Note, if the minimum free energy (mfe) of any interaction is above this threshold, no interaction will be reported.
The parameter constraints are: Input value has to be parsable as a Double. The value must be smaller than or equal to 0. In NCBI mode, this parameter has to be '0.0'.

?  Max. delta energy above mfe of an interaction

Output only interactions with an energy below or equal to the minimum free energy (mfe) + this delta energy term in kcal/mol.
The parameter constraints are: Input value has to be parsable as a Double. The value must be greater than or equal to 0 and must be smaller than or equal to 100.

Seed Parameters

?  Min. number of basepairs in seed

Minimal number of intermolecular base pairs in the seed region. Note, for webserver use this value is restricted.
The parameter constraints are: Input value has to be parsable as a Integer. The value must be greater than or equal to 0 and must be smaller than or equal to 12. Has to be greater than the number of unpaired bases within the seed (for webserver use).

?  Max. Number of mismatches in seed

Maximal number of unpaired bases in the seed region in both sequences. Note, for webserver use this value is restricted.
The parameter constraints are: Input value has to be parsable as a Integer. The value must be greater than or equal to 0 and must be smaller than or equal to 20.

?  Maximal energy

Maximal overall interaction energy of the seed interaction to be considered for further interaction prediction. Note, a value too low will discard all seeds and thus result in no predicted interaction. A value too high will cause weaken the seed constraint.
The parameter constraints are: Input value has to be parsable as a Double. The value must be greater than or equal to -999 and must be smaller than or equal to 999.

?  Minimal unpaired probability (each)

The minimal unpaired probabilitiy of the seed interaction sites (checked independently for query and target seed site) to be considered for further interaction prediction. Note, a value too high will discard all seeds and thus result in no predicted interaction.
The parameter constraints are: Input value has to be parsable as a Double. The value must be greater than or equal to 0 and must be smaller than or equal to 1.

?  Seed position (query)

Seed search and position is constrained to this region of the query ncRNA. The start and the end position of the region have to be given in 5' to 3' direction of the RNA starting from position 1.
The parameter constraints are: Has to be in the format 'FROM-TO' to give the coordinates FROM where TO where the seed is to be found (index counting starts with 1).

?  Seed position (target)

Seed search and position is constrained to this region of the target RNA. The start and the end position of the region have to be given in 5' to 3' direction of the RNA starting from position 1.
The parameter constraints are: Has to be in the format 'FROM-TO' to give the coordinates FROM where TO where the seed is to be found (index counting starts with 1).

Folding Parameters

?  Temperature for energy computation

Temperature in degrees Celsius used to rescale energy parameters.
The parameter constraints are: Input value has to be parsable as a Double. The value must be greater than 0.

?  Access. query: folding window size

Size of the averaging window in the local query ncRNA folding (RNAplfold -W) for the computation of accessibilities.
Local folding is key to reasonable folding results when facing larger RNA molecules, since it minimizes effects of incorrect long-range predictions (see local folding article).
Note, the folding window size should be about 50nt higher than the max. basepair distance.
If set to 0, no sliding window is used and the full sequence length is considered. The same holds, if the value is larger than the sequence length.
The parameter constraints are: Input value has to be parsable as a Integer. The value must be greater than or equal to 0. The window size has to be at least as large as the maximal basepair distance (query).

?  Access. query: max. basepair distance

Maximal distance of two paired bases in the local query ncRNA folding (RNAplfold -L) for computation of accessibilities.
Local folding is key to reasonable folding results when facing larger RNA molecules, since it minimizes effects of incorrect long-range predictions (see local folding article).
Note, max. basepair distance should be about 50nt less than the the folding window size.
If set to 0, the sliding window size value is also used for base pair span restrictions. The same holds, if the value is larger than the sequence length.
The parameter constraints are: Input value has to be parsable as a Integer. The value must be greater than or equal to 0.

?  Access. target: folding window size

Size of the averaging window in the local target RNA folding (RNAplfold -W) for the computation of accessibilities.
Local folding is key to reasonable folding results when facing larger RNA molecules, since it minimizes effects of incorrect long-range predictions (see local folding article).
Note, the folding window size should be about 50nt higher than the max. basepair distance.
If set to 0, no sliding window is used and the full sequence length is considered.
The parameter constraints are: Input value has to be parsable as a Integer. The value must be greater than or equal to 0. In NCBI mode, window size has to be greater or equal to the sum of 'nt up' and 'nt down'. Furthermore, the window size has to be at least as large as the maximal basepair distance (target).

?  Access. target: max. basepair distance

Maximal distance of two paired bases in the local target RNA folding (RNAplfold -L) for computation of accessibilities.
Local folding is key to reasonable folding results when facing larger RNA molecules, since it minimizes effects of incorrect long-range predictions (see local folding article).
Note, max. basepair distance should be about 50nt less than the the folding window size.
If set to 0, the sliding window size value is also used for base pair span restrictions.
The parameter constraints are: Input value has to be parsable as a Integer. The value must be greater than or equal to 0.

?  Energy parameter set (Vienna package)

Defines what energy parameter set to be used within the Vienna RNA package to compute base pair probabilities, i.e. dot plots, for each input sequence. The parameter sets are provided by the Vienna RNA package (see version information) and are named according to the first author and year of the corresponding publication.

Output Description

The output table summarizes the best 100 predicted interactions. It can be sorted by clicking on the header of a column. NOTE: If your sequence input combinatorics exceeds 100, the output will not cover all possible combinations!

For the selected table row, the according interaction is shown below the table along with additional information such as interaction positions, different contributions of the interaction energy score, etc.

Details on the information: Furthermore in NCBI mode:

Functional Annotation Chart:

The top 50 targets have been subjected to functional enrichment. The heatmap shows all members of clusters with a DAVID enrichment score >= 1 in a specific color. Each row represents a gene and each column a specific functional term. If the gene can be assigned to a term, the corresponding square is filled/colored. Closely related terms are assigned to a cluster and have the same color. The opacity of the color depends on the p-value of the IntaRNA prediction. A more intense color represents a more significant p-value. The "Fold enrichment" is given in front of the term descriptions. It gives the enrichment of a term in the prediction group in relation to the whole genome background (e.g. a term with an enrichment of 10 contains 10 times more genes belonging to the respective term than the background). The enrichment scores give a measure of the biological significance of the cluster. A higher score represents a more statistically significant enrichment. The publication on the DAVID webserver suggests to investigate clusters with an enrichment score of >= 1.3.

Downloadable files in NCBI mode:

Functional Enrichment:

This file contains the DAVID functional enrichment result for the target candidates up to IntaRNA p-values <= 0.01. A certain term appears as enriched, if it is significantly overrepresented in the top list when compared to the background. The background in this case are all genes for which there is a prediction (not the entire set of genes of an organism). Enrichment scores of 1.3 and higher, suggest statistical significance. However, enrichments also strongly depend on the quality of the annotation of the entered organism of interest. The file is tab delimited. This result is only calculated for the organism of interest.

Regions plots:

These plots are meant to give you an overview of the regions in the target and sRNA sequences that play predominant roles in the statistically significant interactions. The density plot in the top of the image, is calculated from all predicted interactions with a IntaRNA-pvalue <= 0.01, while the interactions displayed in the bottom of the image are shown for the top 25 predictied targets. The plots can be downloaded in postscript, pdf and png format.

Input Examples

?  ncRNA-NCBI prediction

Yfr1 - example
The example's result can be directly accessed here

?  ncRNA-mRNA prediction

ncRNA-mRNA prediction
The example's result can be directly accessed here

Frequently Asked Questions

If your question is not listed, please send it to us!

? My sequences are longer than allowed within the webserver input, what can I do?

The restriction of the RNA input length (and number) is due to the limited resources available for the webserver computations. In order to deal with large(er) inputs your can

(A) install IntaRNA locally (recommended)

(B) chunk up your long sequences into overlapping subsequences, e.g. using
|---------------------------|   = full sequence
|--------------|------------|   = first chunk set
|-------|-------------|-----|   = second chunk set
Given the (two) sets: (1) run independent predictions, (2) merge the output lists, (3) check the top ranking results.
You might have to manually remove/ignore high-ranking hits close to the sequence's cut points (less than 100nt), since the accessibility computation of these regions is strongly biased by the sequence decomposition.

? How does the mRNA length influence the energy score reported by IntaRNA?

IntaRNA is based on minimization of an energy score that incorporates the hybridization energy and the accessibility of the interaction sites in both RNAs. The hybridization energy alone depends only the length of the interaction. For computation of accessibilities in the mRNA, it is assumed the mRNAs are folded locally allowing only base pairs with the given maximal span. The accessibilitiy for the interaction site in the mRNA is averaged over all windows of the given size that contain the interaction site.

? How is accessibility defined?

The accessibility of the interaction site is the free energy that is required to make it single stranded. It is defined as the difference between the free energy of the ensemble of all RNA secondary structures and the free energy of the ensemble of RNA secondary structures, where the interaction site is single stranded.

? How are accessibilities calculated?

The calculation of accessibilities is based on ensemble free energies. Ensemble free energies are calculated using a partition function approach (McCaskill, 1990) assuming global folding of the ncRNA and local folding of the mRNA. For this purpose, RNAplfold and RNAup are integrated into IntaRNA via the Vienna RNA library (Hofacker et al., 1994; Bernhart et al., 2006, Mückstein et al., 2008).

? What are the fdr values and how to interpret them?

The fdr (false discovery rate) values are most easily explained with an example. Assume a fdr cutoff of 0.5. Statistically speaking, 50% of all predicted targets in the list up to this cutoff are assumed to be false positives. The fdr gives you an impression of how many incorrect predictions to expect up to a certain threshold. The fdr values are computed using the R-function p.adjust and the method by (Benjamini&Hochberg, 1995).

? Where can I see/download the used target RNAs derived for my NCBI RefSeq ID?

The target sequences downloaded from NCBI for the given RefSeq ID are available for download in FASTA format in the 'Input Parameter' section of the result page. The FASTA file is linked for parameter 'Target RNA in FASTA'.

? The putative targets are sorted in the reverse order in the regions plot when compared to the main result table. Which sorting should I trust?

The reverse sorting in the regions plots is due to our plotting script. This means that you should trust the initial sorting of the main result table.

List of Changes