Choosing the correct BLAST algorithm

SequenceServer has an auto-detection feature that selects the appropriate BLAST algorithm for your input data and databases.

SequenceServer has auto-detection of BLAST algorithms by assessing both the input query and the subject database.

However, there are five basic BLAST algorithms: blastp, blastn, tblastx, tblastn, and blastx. Each algorithm has a different use case, and it’s essential to choose the appropriate one for your analysis. This post will help you choose the right one.

The appropriate BLAST algorithm choice depends on what you’re trying to do.

As biologists, we work with nucleotide sequences and protein (i.e., amino-acid) sequences. Several versions of BLAST exist so we can analyze both types of sequences. Are we searching with a nucleotide sequence or a protein sequence? Are we comparing that to a database of amino-acid sequences such as UniRef90 or to a database of nucleotide sequences such as the Telomere-to-Telomere human genome?

The correct BLAST algorithm depends on the type of query sequence and the type of database sequence. Below is a summary overview from our 2019 Mol Biol Evol paper:

Overview of BLAST algorithms and how they are used

Choosing the wrong algorithm can lead to incorrect results

Choosing the wrong algorithm can lead to incorrect results. For example, if you want to search with a nucleotide query sequence but run blastp, BLAST will still run. But it will give you incorrect results—false negatives. You will erroneously conclude that there is no similarity between your query sequence and the selected database. You should have used blastn, tblastn or tblastx depending on your database and the expected evolutionary distance between your query and the sequences you are comparing against.

SequenceServer automatically chooses the right algorithm depending on your query and database sequence types

So, if you’re running BLAST locally or at NCBI, you need to know the type of query sequence and the type of database sequence. Think carefully before clicking.

However, if you’re using SequenceServer, no need to worry. SequenceServer automatically chooses the appropriate algorithm. Indeed, it has an “automagic” selection mechanism that identifies query type and database type, and selects the BLAST algorithm that will work best. You can focus on the science and avoid costly mistakes.

In the screenshot below, a biologist pasted some nucleotide sequences as the query, and selected a protein database. SequenceServer auto-detected this and consequently selected BLASTX, the only algorithm appropriate for comparing nucleotide sequences to a protein database.

SequenceServer automatically selected BLASTX after detecting that the user entered a nucleotide query to search a protein database

blastn vs. tblastx: two options for comparing nucleotide sequences

Things are a bit more complex if you search with nucleotide query sequences against nucleotide databases. You have a choice between blastn and tblastx. Why are there two algorithms that seemingly do the same thing? What are the tradeoffs, and which should you choose?

Algorithmic differences between blastn and tblastx

In short, blastn does comparisons in nucleotide space. It compares nucleotides directly. It does this using the forward sequence, and the reverse-complement sequence.

In contrast, tblastx performs its comparisons in the world of amino-acid sequences. For that, tblastx translates the nucleotide query sequence into amino-acid sequences using all six possible reading frames (three forward and three reverse-complement). And tblastx does the same thing with the nucleotide database, translating it into all six possible translated amino-acid sequences. Thus, each query sequence is effectively compared to the database sequence in thirty-six directions.

Tradeoffs between blastn and tblastx

The algorithmic differences between blastn and tblastx create multiple tradeoffs:

Conclusion

In conclusion, it’s crucial to choose the right algorithm for your data types and question. SequenceServer will automatically choose what works for the sequence types you’re entering. But if you’re running BLAST locally or at NCBI, you must carefully think through which types of query and database sequences you’re comparing.

Overview of BLAST algorithms and how they are used

For specific applications, additional adjustments are needed. For example,

Stay up to date

To receive the latest news from our team, enter your email:

Some blog posts you might like: