siRNA Scoring Explained

On-Target Efficacy Prediction Methods

DSIR (Designer of Small Interfering RNA)

DSIR was developed by Jean-Philippe Vert et al. on the dataset of Huesken et al. with the LASSO regression method, the associated publication appeared in BMC Bioinformatics in 2006.

The DSIR algorithm predicts siRNA efficacy using an interpretable linear regression model built on two core types of sequence-derived features. The first feature set captures position-specific nucleotide preferences by encoding 21-nt antisense (guide) strands as a binary vector. These features reflect well-established findings that certain nucleotides at specific positions are statistically associated with siRNA potency.

Complementing this, DSIR incorporates motif-based features - the spectral representation, which counts the frequency of all possible 1- to 3-mer nucleotide motifs in the siRNA sequence. This representation captures broader sequence patterns beyond fixed positions, resulting in an 84-dimensional vector based on all possible combinations of 1-mer (4), 2-mer (16), and 3-mer (64) motifs. The combined feature space enables DSIR to robustly model siRNA efficacy while maintaining biological interpretability and computational efficiency.

i-Score

i-Score was developed by Masatoshi Ichihara et al. The work was published in Nucleic Acids Research in 2007. The linear regression model was developed by analyzing Dieter Huesken dataset, which contains activity data for 2431 siRNAs. The algorithm's performance was validated using subsets of this data and other independent datasets.

The i-Score (inhibitory-Score) algorithm predicts siRNA activity using a linear regression model. Its defining characteristic is its simplicity: the model relies exclusively on position-specific nucleotide preferences within the siRNA sequence. No other parameters, such as sequence motifs or complex thermodynamic calculations beyond those implicitly captured by nucleotide preference at specific positions, are incorporated into the core predictive model. It scores 19-mer siRNAs from 0 to 100 by analyzing nucleotide preferences at each position in the antisense (guide) strand.

s-Biopredsi

The original BIOPREDsi algorithm was developed by Huesken et al. by applying an artificial neural network model to 2431 siRNAs. As the actual parameters of BIOPREDsi are not published anywhere, Ichihara et al. developed a similar scoring system called s-Biopredsi (simulated Biopredsi), by applying a single-node neural network model on 21-nt siRNAs subset identical to those employed to develop BIOPREDsi. Correlation coefficients between BIOPREDsi and s-Biopredsi were up to 0.9999 for validation datasets.

Katoh (siExplorer)

This approach was developed by Takayuki Katoh and Tsutomu Suzuki and published in Nucleic Acids Research in 2007. The 3-nt periodicity rule was derived from analyzing siRNA activity data, including a dataset of 702 siRNAs targeting a single EGFP gene. The siExplorer algorithm was validated using siRNAs targeting endogenous human GAPDH gene.

The algorithm proposed by Katoh et al. predicts siRNA activity by decomposing sequence-based effects into two additive components: the macro effect and the micro effect. The macro effect captures the overall base composition of a sense strand of 19-nt siRNA, reflecting the influence of total nucleotide content (A, U, G, C) on silencing efficiency. Each nucleotide is assigned a weight that reflects its average contribution to activity, and these are scaled through a linear transformation to produce a macro score that correlates with measured RNAi activity (ranging from 0 to 100%). This component effectively models general trends in base composition preference across the siRNA.

To refine predictions and account for position-specific variability, the model introduces a micro effect, which represents the contribution of individual nucleotides at specific positions (1–19) along the siRNA sense strand. By summing the micro effect contributions across all positions, the algorithm captures periodic fluctuations related to sequence context. Together, the macro and micro components form a linear scoring system that correlates with experimental activity data and provides an interpretable framework for siRNA efficacy prediction.

DH-Score (simulated siDESIGN Center scoring algorithm)

The scoring system used by the siDESIGN Center is a proprietary algorithm developed to identify functional siRNAs. We independently developed a scoring system, referred to here as DH-Score, which closely replicates the behavior and trends of siDESIGN Center's outputs based on publicly available data. DH-Score was created without access to proprietary algorithms, internal parameters, or non-public datasets, ensuring that it remains an independent and non-affiliated reproduction.

To develop DH-Score, a dataset of 1,067 siRNA sequences and their corresponding scores was compiled from siDESIGN Center. The dataset was split into training and testing subsets in an approximate 80:20 ratio, with all sequences containing homopolymeric stretches of four identical nucleotides (e.g., AAAA, CCCC) assigned to the test set. No sequences with five or more consecutive identical nucleotides were observed, suggesting that such sequences are filtered and not scored by the siDESIGN Center system.

A linear regression model was trained using antisense strands from the training dataset, resulting in the positional weight matrix shown below.

Positional Weight Matrix used for DH-Score algorithm

Although the exact data used to originally construct siDESIGN Center's algorithm is unknown, the derived matrix in DH-Score reflects many empirically established principles for designing functional siRNAs, as described by multiple studies in the RNAi field.

Correlation of siDESIGN Center's score and DH-Score During validation, we observed that sequences containing single homopolymeric regions (AAAA, CCCC, GGGG, UUUU) receive consistent penalties. Incorporating this adjustment into the scoring process, we evaluated the independent test dataset and found a very high correlation between DH-Score and the original siDESIGN Center scores (r2 = 0.999).

It is important to emphasize that DH-Score is an independently developed simulation based solely on public observations and does not utilize any confidential or proprietary information from siDESIGN Center. While DH-Score captures the major trends in scoring, it may not account for all the internal filtering or layered criteria of the original tool. Notably, DH-Score produces theoretical scores for any properly structured antisense siRNA strand, including sequences with five or more homonucleotides, and provides scores even for siRNAs that would otherwise be excluded (e.g., those scoring below 50 in siDESIGN Center).

DH-Score was developed and validated in March 2025 and demonstrates excellent reproducibility of key scoring patterns observed in siDESIGN Center outputs.

Off-Target Effect Assessment Methods

Seed Region Melting Temperature (Tm)

This method involves calculating the melting temperature (Tm) of the short duplex formed between the siRNA strand's seed region (positions 2-8) and a potential complementary site on an mRNA. The Tm value serves as a quantitative measure of the thermodynamic stability of this specific interaction. The underlying hypothesis is that weaker, less stable interactions between the seed region and potential off-target sites are less likely to lead to significant off-target gene silencing. Therefore, siRNAs with lower seed region Tm values are generally preferred, as they are predicted to have a reduced propensity for miRNA-like off-target binding. Some design tools, like siDirect 2.0, use specific Tm thresholds; for example, a maximum Tm of 21.5 °C is suggested as a benchmark to identify sequences that are likely to be largely free of seed-dependent off-target effects. See the reference (Ui-Tei et al., 2008) for detail.

POTS (Potential Off-Target Score)

POTS was developed as part of the siSPOTR (siRNA Seed Potential of Off-Target Reduction) algorithm by Ryan L. Boudreau and colleagues in the laboratory of Beverly L. Davidson. The work was published in Nucleic Acids Research in 2012, source code for standalone version is here.

POTS (Potential Off-Target Score) is a weighted scoring system designed to predict the off-targeting potential of an siRNA based on its seed region (positions 2-8 in antisense strand). It integrates both the type and frequency of potential seed complement binding sites (8-mer, 7m8, 7A1, and 6-mer) found within the 3'-UTRome (the collection of all 3' UTR sequences in a transcriptome).
Unlike some earlier methods that might only count hexamer sites, POTS uses a weighted approach based on heptamer seeds (positions 2-8) and distinguishes between different 7-mer and 8-mer site types, acknowledging their differing repressive potentials. This allows for a more nuanced prediction of off-targeting, differentiating between seeds that share the same 6-mer core but differ at position 8, which can significantly impact the number of potent 7m8 and 8-mer sites. The score provides a quantitative ranking of siRNA candidates based on their predicted off-target liability. While not a strict threshold, the authors note that POTS < 50 was associated with previously validated low off-targeting siRNAs and potentially better tolerability in cytotoxicity assays.

Seed-Pairing Stability (SPS)

The concept of thermodynamic stability is fundamental, but its specific role in miRNA/siRNA proficiency, particularly in conjunction with Target-site Abundance (TA), was investigated in detail by David M. Garcia, David P. Bartel, and colleagues, as published in Nature Structural & Molecular Biology in 2011. Their work built upon earlier observations correlating predicted SPS with the propensity of siRNAs to repress unintended targets (off-targeting).

Seed-Pairing Stability (SPS) refers to the predicted thermodynamic stability of the duplex formed between the miRNA or siRNA seed region (typically nucleotides 2-7 or 2-8) and its complementary target mRNA sequence. It is usually quantified by calculating the predicted Gibbs free energy change (ΔG°) for this interaction, with more negative values indicating stronger, more stable pairing.

The stability of this initial seed pairing is thought to influence the overall targeting proficiency and specificity. The hypothesis explored is that a certain threshold of stability (SPS) might be required for the miRNA/siRNA to remain associated with a target long enough to mediate effective repression. Very weak SPS (less negative ΔG°) might lead to lower targeting proficiency and potentially fewer off-target interactions because the binding is less stable. Conversely, very strong SPS might increase overall proficiency but could also potentially increase the likelihood or strength of off-target binding.