Freiburg RNA Tools
BrainDead - Help
BIF
IFF

Introduction

BrainDead learns and predicts a two-class model for short RNA sequences based on accessibility-enhanced k-mer features given (1) set of (short) RNAs along with (2) respective class annotation (label 1 vs -1) and (2) a set of k-mers to be used as features. For each k-mer, its occurrence within the sequence with and without accessibility constraints is counted. The latter includes both intra-molecular structure and homo-dimer RNA-RNA interaction formation as well as their combination. Thus, for each k-mer 4 features are generated and used within the machine learning model. < br > The model is trained to predict the provided class annotation and is applied for the respective prediction for the given set of candidate sequences.

When using BrainDead please cite :

Results are computed with BrainDead version 1.0.1 using IntaRNA 3.2.0 and Vienna RNA package 2.4.18

Overview

The following parameters are used to control the execution of BrainDead

Furthermore, additional information is available

Training data

?  Class-annotated RNAs in FASTA

The training set of RNAs in single-line FASTA format with 1/-1 class annotation, which is to be learned by the model. Input can be given either as direct text input or by uploading a file. Each sequence should contain only the characters A, C, G, T, U.


Each RNA entry in class-annotated FASTA format begins with a single-line identified line, which starts with a greater-than (">") symbol followed by the RNA name (without whitespaces), followed by the respective RNA class annotation using '-1' and '1'. The subsequent line provides the RNA's sequence data. An example is given below.

>miRNA-123 1
UUACUCUUCUCAUUAUGGUCA
>miRNA-456 -1
GGCGCGUAUGAUGCGGCGUAGUG
The parameter constraints are: The input has to be in valid FASTA format. No multiline input is allowed (sequence etc. in one line each). The number of sequences has to be at least 20 and at most 1000. Sequence lengths have to be in the range 10-40. The allowed sequence alphabet is 'ACGUTacgut'. Each FASTA sequence header/name has to match against the regular expression '\S+\s+[-+]?1'.
Defaults to ()

Model parameters

?  k-mers of interest

The set of k-mers to be used to generate the features for the machine learning model. For each k-mer, its occurrence within the sequence with and without accessibility constraints is counted. The latter includes both intra-molecular structure and homo-dimer RNA-RNA interaction formation as well as their combination. Thus, for each k-mer 4 features are generated and used within the model. The respective occurence table is provided in the output for both the training and candidate RNA sequences.
The parameter constraints are: Comma-separated list of k-mers
Defaults to ()

?  k-mer features reflect

Defines whether a feature encodes the number of occurrences of a given k-mer (values >= 0) or only whether or not it is found at all (values 0 or 1).

?  Max. energy ranked stable

The maximal energy of a structure or interaction (in kcal/mol) to be considered stable, i.e. unlikely to be immediately dissolving. This threshold is applied to filter for intra-molecular structures and homo-dimers that show energies below this threshold. All sequence positions of such structures that are involved in base pairing are considered non-accessible for the respective k-mer features.
The parameter constraints are: Input value has to be parsable as Double. The value must be smaller than or equal to 0.
Defaults to (-3)

?  Machine learning model

Defines which machine learning model should be trained and used. For details, please refer to scikit documentation

Candidate data

?  Candidate RNAs in FASTA

The candidate RNAs in single-line FASTA format for which class annotation is to be predicted using the learned model. Input can be given either as direct text input or by uploading a file. Each sequence should contain only the characters A, C, G, T, U.
A sequence in FASTA format begins with a single-line sequence identifier that starts with a greater-than (">") symbol, followed by a single line of sequence data.
The parameter constraints are: The input has to be in valid FASTA format. No multiline input is allowed (sequence etc. in one line each). The number of sequences has to be at least 0 and at most 3000. Sequence lengths have to be in the range 10-40. The allowed sequence alphabet is 'ACGUTacgut'.
Defaults to ()

Output Description

Within the 'Downloads' section, BrainDead provides in CSV-format the identified accessibility-enhanced k-mer features for each sequence. This is complemented with a TXT file that encodes respective position information (where position index information starts with 0). Within the TXT file, positions are reported in context groups (square brackets), where the order of the groups is 'any context', 'free of intra bp', 'free homo-dimer bp', and 'free of any bp'. These files are available for both training and candidate RNAs.

Statistics on the trained model are available for download along with the model data itself to be locally used.

If candidate sequences are provided, predicted class annotation along with 'reliability' probabilities to be part of either class are shown within a table. The underlying data is available for download too (CSV file 'class prediction' within the 'Downloads' section for candidate RNAs).

Input Examples

?  miRNAs as ligands for microglia activation

This example summarizes our study that investigates the ability of mature miRNAs to act as immune receptor ligands. The ability of extracellular miRNAs to directly activate receptors is a recently discovered new field of operation of miRNAs beside their classic role in post-transcriptional gene regulation. The small RNAs available within the example's training data were experimentally tested for their potential to activate murine microglia cells in vitro. They are pre-classified as +/-1 when found activating or non-activating, resp., using fold change analyses based TNF-alpha concentration measurements. The example's training data set comprises both the original training data (top group) as well as the experimentally verified candidate sequences (middle and bottom group) from our initial BrainDead main publication (see list of references within Help page). For details on the selected k-mers, please refer to the manuscript. The candidate set covers all mature human miRNAs from mirBase v22.1.
The example's result can be directly accessed here