BLAST E-values: how they are calculated and what they mean

A crucial measure that accompanies every hit sequence that BLAST identifies is the E-value, from Expectation value. (aka, E value, e-value, evalue). Here, we’ll walk through:

BLAST outputs alignment hits with different metrics. One of these is the E-value, which is an expected value of how likely we are by chance to see the same hit within our search.

SequenceServer BLAST result highlighting that E-values are shown in two places in the BLAST result report: in the table of all hits, and as part of the alignment of each hit.

What is an E-value?

The BLAST E-value is:

Instead, it is an estimate of the expected number of random alignments with a particular score or better that could be found by chance in a given database search. In other words, it represents the likelihood that a specific sequence alignment is due to chance rather than a true biological relationship between the sequences.

Interpreting E-values

The E-value describes the number of hits we expect to see by chance when BLASTing a database. It helps us understand if our hits are relatively unique or not. For example, an E-value of 1 means that one expects by chance to see 1 match with a similar score. We need to be careful with interpreting E-values and need to consider the biological question and datasets. This includes the context of the specific research question and alongside other factors like alignment length, sequence identity, and biological context or question. However, in general:

E-values are not fixed thresholds for determining the significance of an alignment. Always consider the biological context and the datasets used.

In many cases, BLAST analysis is just a first step. In particular, a stronger E-value does not necessarily imply a stronger evolutionary relationship.

Interpreting it like that is a common mistake! To understand relationships across sequences, you should typically also perform multiple sequence alignment followed by phylogenetic reconstruction. Additional evidence also helps (e.g., understanding sequence conservation and domain architecture).

How is the BLAST E-value calculated?

The E-value is calculated based on the alignment score (S), the search space size (m × n), and the parameters derived from the scoring system and the database composition, such as the Karlin-Altschul parameters (K and λ). The formula for E-value is:

E-value = K × m × n × e-λS

Where:

The E-value thus depends on the database size. Larger databases have more chances of producing the alignment you see by chance… so E-values for the same amount of similarity end up being weaker (higher).

So how should I tweak my BLAST analysis to get the most power?

  1. Use the appropriate database. If you’re looking for a particular gene in humans… only BLAST against the human genome… not against a database that is orders of magnitude greater. Doing so would make it less likely for you to get strong E-values, even if the gene is present in the human genome. And the BLAST analysis would also take much longer.
  2. Use the appropriate BLAST algorithm for your biological question and evolutionary distance. Consider that nucleotides diverge faster than protein sequences. So:
    • if you’re comparing highly similar sequences (e.g., to help identify intron-exon boundaries, or allelic differences), use BLASTN.
    • if you’re identifying orthologs across species, use BLASTP. To be certain that a gene is absent from a species, use TBLASTN.
  3. Use an appropriate scoring matrix. BLOSUM62 is used by default. But for longer evolutionary timescales, the PAM250 is more appropriate.
  4. Investigate different E-value thresholds to see the impact on the resulting hits.

Aren’t these kinds of adjustments “E-value hacking”?

No. If done appropriately it’s just using the right tool for the job. In fact, we need to consider all of the above to make sure the E-value is useful for our biological questions.

Stay up to date

To receive the latest news from our team, enter your email:

Some blog posts you might like: