Introduction
BrainDead learns and predicts a two-class model for short RNA
sequences based on accessibility-enhanced k-mer features given (1) set of
(short) RNAs along with (2) respective class annotation (label 1 vs -1)
and (2) a set of k-mers to be used as features.
For each k-mer, its occurrence within the sequence with and without
accessibility constraints is counted. The latter includes both
intra-molecular structure and homo-dimer RNA-RNA interaction formation
as well as their combination. Thus, for each k-mer 4 features are
generated and used within the machine learning model.
< br >
The model is trained to predict the provided class annotation and
is applied for the respective prediction for the given set of
candidate sequences.
When using BrainDead please cite :
- Martin Raden, Thomas Wallach, Milad Miladi, Yuanyuan Zhai, Christina Krueger, Zoe J. Mossmann, Paul Dembny, Rolf Backofen, Seija Lehnardt
Structure-aware machine learning classification of oligonucleotide-induced immune response identifies microRNAs operating as Toll-like receptor 7/8 ligands
RNA Biology, 18:sup1, 268-277,2021. - Martin Raden, Syed M Ali, Omer S Alkhnbashi, Anke Busch, Fabrizio Costa, Jason A Davis, Florian Eggenhofer, Rick Gelhausen, Jens Georg, Steffen Heyne, Michael Hiller, Kousik Kundu, Robert Kleinkauf, Steffen C Lott, Mostafa M Mohamed, Alexander Mattheis, Milad Miladi, Andreas S Richter, Sebastian Will, Joachim Wolff, Patrick R Wright, and Rolf Backofen
Freiburg RNA tools: a central online resource for RNA-focused research and teaching
Nucleic Acids Research, 46(W1), W25-W29, 2018.
Results are computed with BrainDead version 1.0.1 using IntaRNA 3.2.0 and Vienna RNA package 2.4.18
Overview
The following parameters are used to control the execution of BrainDead
Furthermore, additional information is available
Training data
Class-annotated RNAs in FASTA
The training set of RNAs in single-line FASTA format with 1/-1 class annotation, which is to be learned by the model.
Input can be given either as direct text input or by uploading a file. Each sequence should contain only the characters A, C, G, T, U.
Each RNA entry in class-annotated FASTA format begins with a single-line identified line, which starts with a greater-than (">") symbol followed by the RNA name (without whitespaces), followed by the respective RNA class annotation using '-1' and '1'. The subsequent line provides the RNA's sequence data. An example is given below.
Each RNA entry in class-annotated FASTA format begins with a single-line identified line, which starts with a greater-than (">") symbol followed by the RNA name (without whitespaces), followed by the respective RNA class annotation using '-1' and '1'. The subsequent line provides the RNA's sequence data. An example is given below.
>miRNA-123 1 UUACUCUUCUCAUUAUGGUCA >miRNA-456 -1 GGCGCGUAUGAUGCGGCGUAGUG
The parameter constraints are: The input has to be in valid FASTA format. No multiline input is allowed (sequence etc. in one line each). The number of sequences has to be at least 20 and at most 1000. Sequence lengths have to be in the range 10-40. The allowed sequence alphabet is 'ACGUTacgut'. Each FASTA sequence header/name has to match against the regular expression '\S+\s+[-+]?1'.
Defaults to ()
Defaults to ()
Model parameters
k-mers of interest
The set of k-mers to be used to generate the features for the machine learning model.
For each k-mer, its occurrence within the sequence with and without
accessibility constraints is counted. The latter includes both
intra-molecular structure and homo-dimer RNA-RNA interaction formation
as well as their combination. Thus, for each k-mer 4 features are
generated and used within the model.
The respective occurence table is provided in the output for both the
training and candidate RNA sequences.
The parameter constraints are: Comma-separated list of k-mers
Defaults to ()
Defaults to ()
k-mer features reflect
Defines whether a feature encodes the number of occurrences of
a given k-mer (values >= 0) or only whether or not it is
found at all (values 0 or 1).
Max. energy ranked stable
The maximal energy of a structure or interaction (in kcal/mol)
to be considered stable, i.e.
unlikely to be immediately dissolving. This threshold is applied
to filter for intra-molecular structures and homo-dimers that
show energies below this threshold. All sequence positions
of such structures that are involved in base pairing are
considered non-accessible for the respective k-mer features.
The parameter constraints are: Input value has to be parsable as Double. The value must be smaller than or equal to 0.
Defaults to (-3)
Defaults to (-3)
Machine learning model
Defines which machine learning model should be trained and used.
For details, please refer to
scikit documentation
Candidate data
Candidate RNAs in FASTA
The candidate RNAs in single-line FASTA format for which class annotation is to be predicted using the learned model.
Input can be given either as direct text input or by uploading a file. Each sequence should contain only the characters A, C, G, T, U.
A sequence in FASTA format begins with a single-line sequence identifier that starts with a greater-than (">") symbol, followed by a single line of sequence data.
A sequence in FASTA format begins with a single-line sequence identifier that starts with a greater-than (">") symbol, followed by a single line of sequence data.
The parameter constraints are: The input has to be in valid FASTA format. No multiline input is allowed (sequence etc. in one line each). The number of sequences has to be at least 0 and at most 3000. Sequence lengths have to be in the range 10-40. The allowed sequence alphabet is 'ACGUTacgut'.
Defaults to ()
Defaults to ()
Output Description
Within the 'Downloads' section, BrainDead provides in CSV-format
the identified accessibility-enhanced k-mer features
for each sequence.
This is complemented with a TXT file that encodes respective position information
(where position index information starts with 0).
Within the TXT file, positions are reported in context groups (square brackets), where the
order of the groups is 'any context', 'free of intra bp', 'free homo-dimer bp', and 'free of any bp'.
These files are available for both training and candidate RNAs.
Statistics on the trained model are available for download along with the model data itself to be locally used.
If candidate sequences are provided, predicted class annotation along with 'reliability' probabilities to be part of either class are shown within a table. The underlying data is available for download too (CSV file 'class prediction' within the 'Downloads' section for candidate RNAs).
Statistics on the trained model are available for download along with the model data itself to be locally used.
If candidate sequences are provided, predicted class annotation along with 'reliability' probabilities to be part of either class are shown within a table. The underlying data is available for download too (CSV file 'class prediction' within the 'Downloads' section for candidate RNAs).
Input Examples
miRNAs as ligands for microglia activation
This example summarizes our study that investigates the ability of mature miRNAs to act as immune receptor ligands.
The ability of extracellular miRNAs to directly activate receptors is a recently discovered new field of operation of miRNAs beside their classic role in post-transcriptional gene regulation.
The small RNAs available within the example's training data were experimentally tested for their potential to activate murine microglia cells in vitro.
They are pre-classified as +/-1 when found activating or non-activating, resp., using fold change analyses based TNF-alpha concentration measurements.
The example's training data set comprises both the original training data (top group) as well as the experimentally verified candidate sequences (middle and bottom group) from our initial BrainDead main publication (see list of references within Help page).
For details on the selected k-mers, please refer to the manuscript.
The candidate set covers all mature human miRNAs from mirBase v22.1.
The example's result can be directly accessed here