Sequence Search
Whether you are studying the genetic makeup of an organism or trying to identify potential targets for drug development, sequence searches can provide valuable insights and information.
Perform sequence searches to:
- find out the potential function of a gene,
- search whether oligonucleotide primer sequences are likely to amplify non-target regions of the genome,
- understand if the Sanger sequence of the gene you cloned matches your expectations,
- check whether targeted CRISPR mutagenesis was successful,
- identify the sequence of a candidate gene in a newly sequenced genome,
- find paralogs or orthologs of a gene to perform multiple sequence alignment and identify which parts of the sequence are most conserved,
- identify the source of a sequence in a metagenomic sample,
- identify it’s conserved domains.
- disentangling homology and orthology relationships.
Where to perform BLAST sequence searches
The most widely used online portal for sequence searches is NCBI’s BLAST search. It’s a handy go-to place given that NCBI house the vast majority of published nucleotide and protein sequences.
But NCBI’s BLAST does not cover all existing sequence data:
- The European Nucleotide Archive hosts a mirror of NCBI’s data,
- UniProt have subsets of protein sequence data that meet specific quality standards;
- domain-specific websites such as Flybase or the ant genome database are also not included in NCBI BLAST.
- NCBI does not work for unpublished sequence data
When using BLAST, at times of high user demand your searches can be “queued” for a while until computing capacity becomes available. Public websites for sequence search also have size restrictions on query sequences.
Most sequence search types can also be performed on a local computing cluster, as is found in many universities, research centers and core facilities and institutes.
You can also perform BLAST analyses in the cloud.
Our SequenceServer software provides a pragmatic alternative for performing local BLAST sequence searches on your computer, including on unpublished data. It includes many visualization approaches. You can install it and run it locally on a Mac or Linux (its free and has been cited more than 130 times). Alternatively, you can use SequenceServer Cloud. Having a SequenceServer Cloud instance enables you (or your team) to have a centrally accessed sequence search repository. Its graphical sequence search interface is fast, accessible from any web browser (including from Windows), takes no space on your computer and enables you to harness the power of a high performance computing cluster.
SequenceServer Cloud makes it easy to perform sequence search results and to interpret them. For this, it leverages cloud computing, publication-ready graphics that facilitate interpretation, and a powerful graphical interface for configuring BLAST databases. [Try it out]
The broad diversity of sequence search algorithms
BLAST, whether used at NCBI, as local installation, or online using a cloud service is the mainly used sequence search algorithm, with more than 100,000 citations.
BLAST is great for searching large databases with “small” query sequences. Today’s BLAST algorithm is far more computationally efficient than those from twenty years ago. BLAST has several algorithms and parameters that are optimized for different types of sequence data, such as nucleotide vs protein sequences, or for different types of queries, such as short oligonucleotide sequences such as primers.
However, BLAST isn’t necessarily the most appropriate sequence search algorithm for every job. Other algorithms include BLAT, USearch, minimap. The following article reviews the history of sequence search algorithms and the tradeoffs among search algorithms:
Evolution of biosequence search algorithms: a brief survey
Gregory Kucherov. Bioinformatics (2019), 35:3547–3552
Although modern high-throughput biomolecular technologies produce various types of data, biosequence data remain at the core of bioinformatic analyses. However, computational techniques for dealing with this data evolved dramatically. Results In this bird’s-eye review, we overview the evolution of main algorithmic techniques for comparing and searching biological sequences. We highlight key algorithmic ideas emerged in response to several interconnected factors: shifts of biological analytical paradigm, advent of new sequencing technologies and a substantial increase in size of the available data. We discuss the expansion of alignment-free techniques coming to replace alignment-based algorithms in large-scale analyses. We further emphasize recently emerged and growing applications of sketching methods which support comparison of massive datasets, such as metagenomics samples. Finally, we focus on the transition to population genomics and outline associated algorithmic challenges.