Reasons for performing sequence searches include:
- wanting to know if the Sanger sequence of the gene you cloned indeed matches the sequence you were expecting,
- checking whether targeted CRISPR mutagenesis was successful,
- identifying the sequence of a candidate gene in a newly sequenced genome,
- finding out the potential function of a gene,
- finding paralogs or orthologs of a gene to perform multiple sequence alignment and identify which parts of the sequence are most conserved,
- searching whether oligonucleotide primer sequences are likely to amplify non-target regions of the genome.
Where to perform BLAST sequence searches
The most widely used online portal for sequence searches is NCBI’s BLAST search. It’s a handy go-to place given that NCBI house the vast majority of published nucleotide and protein sequences.
However, other contexts for BLAST sequence search also exist. These include: The European Nucleotide Archive, which hosts a mirror of NCBI’s data, UniProt, who have subsets of the protein sequence data that meet specific quality standards, and domain-specific websites such as Flybase or the ant genome database.
Challenges with such major repositories include that they get a lot of demand. Thus your sequence search can be “queued” for a while until computing capacity becomes available. Furthermore, public websites for sequence search typically have size restrictions on query sequences. Finally, they do not enable you to search against unpublished sequences.
Most sequence search types can also be performed on a local computing cluster, as is found in many universities, research centers and core facilities and institutes.
You can also perform BLAST analyses in the cloud.
Our SequenceServer software provides a pragmatic alternative for performing local BLAST sequence searches on your computer, including on unpublished data. It includes many visualization approaches. You can install it and run it locally on a Mac or Linux (its free and has been cited more than 130 times). Alternatively, you can use SequenceServer Cloud. Having a SequenceServer Cloud instance enables you (or your team) to have a centrally accessed sequence search repository. Its graphical sequence search interface is fast, accessible from any web browser (including from Windows), takes no space on your computer, enables you to harness the power of a high performance computing cluster.
SequenceServer Cloud makes it easy to perform sequence search results and to interpret them. For this, it leverages cloud computing, publication-ready graphics that facilitate interpretation, and a powerful graphical interface for configuring BLAST databases. [Try it out]
The broad diversity of sequence search algorithms
BLAST, whether used at NCBI, as local installation, or online using a cloud service is the mainly used sequence search algorithm, with more than 100,000 citations.
BLAST is great for searching large databases with “small” query sequences. Today’s BLAST algorithm is far more computationally efficient than those from twenty years ago. However, BLAST isn’t necessarily the most appropriate sequence search algorithm for every job. Other algorithms include BLAT, USearch, minimap. The following article reviews the history of sequence search algorithms and the tradeoffs among search algorithms:
Evolution of biosequence search algorithms: a brief survey
Gregory Kucherov. Bioinformatics (2019), 35:3547–3552
Although modern high-throughput biomolecular technologies produce various types of data, biosequence data remain at the core of bioinformatic analyses. However, computational techniques for dealing with this data evolved dramatically. Results In this bird’s-eye review, we overview the evolution of main algorithmic techniques for comparing and searching biological sequences. We highlight key algorithmic ideas emerged in response to several interconnected factors: shifts of biological analytical paradigm, advent of new sequencing technologies and a substantial increase in size of the available data. We discuss the expansion of alignment-free techniques coming to replace alignment-based algorithms in large-scale analyses. We further emphasize recently emerged and growing applications of sketching methods which support comparison of massive datasets, such as metagenomics samples. Finally, we focus on the transition to population genomics and outline associated algorithmic challenges.