Sequenceserver: A Modern Graphical User Interface for Custom BLAST Databases
Anurag Priyam, Ben J Woodcroft, Vivek Rai, Ismail Moghul, Alekhya Munagala, Filip Ter, Hiten Chowdhary, Iwo Pieniak, Lawrence J Maynard, Mark Anthony Gibbins, HongKee Moon, Austin Davis-Richardson, Mahmut Uludag, Nathan S Watson-Haigh, Richard Challis, Hiroyuki Nakamura, Emeline Favreau, Esteban A Gómez, Tomás Pluskal, Guy Leonard, Wolfgang Rumpf, Yannick Wurm
Molecular Biology and Evolution, Volume 36, Issue 12, December 2019, Pages 2922–2924, https://doi.org/10.1093/molbev/msz185
Abstract
Comparing newly obtained and previously known nucleotide and amino-acid sequences underpins modern biological research. BLAST is a well-established tool for such comparisons but is challenging to use on new data sets. We combined a user-centric design philosophy with sustainable software development approaches to create Sequenceserver, a tool for running BLAST and visually inspecting BLAST results for biological interpretation. Sequenceserver uses simple algorithms to prevent potential analysis errors and provides flexible text-based and visual outputs to support researcher productivity. Our software can be rapidly installed for use by individuals or on shared servers.
Keywords: Visualization, BLAST, comparative genomics, sequence analysis
Introduction
The dramatic drop in sequencing costs has created many opportunities for individuals and groups of researchers to generate genomic or transcriptomic sequences from previously understudied organisms. Many research questions require small- or large-scale sequence comparisons, and BLAST (Basic Local Alignment Search Tool) is the most established tool for many such analyses (Altschul et al. 1990; Camacho et al. 2009). Unfortunately, BLAST analysis of new data can be challenging. There are delays before new data are submitted to and become publicly available on central BLAST repositories such as the NCBI (National Center for Biotechnology Information), and only small queries are feasible on such repositories. BLAST can be downloaded and installed locally, but its usage can be challenging for researchers without experience of command-line interfaces. Finally, commercial software to overcome such hurdles is too costly for many laboratories.
Here, we present Sequenceserver, a free graphical interface for BLAST designed to increase the productivity of biologist researchers performing and interpreting BLAST searches on custom data sets, and of bioinformaticians setting up shared laboratory or community databases. It has a user-centric focus (Garrett 2011) on accompanying researchers through their work process. Below, we provide an overview of Sequenceserver features that facilitate BLAST query submission and interpretation.
Assisted Installation and BLAST Query Submission
Installing Sequenceserver on computers running macOS or Linux is typically rapid, requiring only one or few commands (see online documentation). If necessary, Sequenceserver automates the download of BLAST (Camacho et al. 2009) binaries and can manage the conversion of FASTA files to BLAST databases. A user accesses Sequenceserver’s graphical interface in a web browser at http://localhost:4567 (fig. 1A). All detected BLAST databases are automatically listed here. The user types, pastes or drag-and-drops FASTA format query sequences into a text-field (fig. 1A). To prevent common errors, an alert message is shown and query submission is disabled if the query is invalid (e.g., combining nucleotide and protein sequences). The user then selects databases. The appropriate basic BLAST algorithm will automatically be used (supplementary fig. S1, Supplementary Material online). When multiple algorithms are appropriate, a pull-down in the BLAST submission button allows the user to toggle between them. An “advanced parameters” field provides access to all standard BLAST parameters.
The Sequenceserver results page is designed to facilitate navigation, interpretation, and follow-up analysis (fig. 1B and https://sequenceserver.com/paper/resultsinteractive/; last accessed August 25, 2018). Results are visually structured and will feel familiar to users of NCBI BLAST. If multiple query sequences were submitted, a clickable index of queries is shown. Queries, hits, and BLAST HSPs (high-scoring segment pairs) are numbered to facilitate navigation. For each query, identified hits are summarized in a table and an overview graphic. Each hit includes links for FASTA download, sequence visualization, and potentially to other resources. Such links can be automatically added based on regular expression analysis of identifiers (see online documentation). BLAST results can be downloaded in XML or tab-delimited table formats for further analysis. Similarly, a FASTA file containing all hit sequences, or a selection of hit sequences can be downloaded.
Usage by Individual Researchers and as Part of Community Databases
Usage statistics including downloads, preprint citations, GitHub, and mailing list participation (fig. 1C) indicate that Sequenceserver is extensively used for molecular-genetic research on emerging model organisms (supplementary table S1, Supplementary Material online). For example, Sequenceserver installations on personal computers helped characterize the evolution of tunicate genomes (Blanchoud et al. 2018), fire ant olfactory genes (Pracana et al. 2017), and loci affecting Sorghum shoot architecture (McCormick et al. 2016). Sequenceserver has also been used to analyze human prostate cancer genomes (Seim et al. 2017) and to identify bacteria affecting shelf life of milk (Reichler et al. 2018).
Importantly, Sequenceserver also represents a main querying mechanism for more than 50 community genome databases (supplementary table S2, Supplementary Material online), including the PHI-base database of genes underpinning pathogen–host interactions (Winnenburg et al. 2006), an initiative to sequence 1,000 wild yeast genomes (Shen et al. 2016), and the http://reefgenomics.org coral genomics database; last accessed August 25, 2019 (Liew et al. 2016). Such community resources typically integrate Sequenceserver as part of larger web servers (e.g., Nginx [Reese 2008]) and customize it by adding links from BLAST hits to genome browsers or other gene-specific information. Additionally, many password-protected Sequenceserver instances exist for unpublished data.
Outlook
In creating Sequenceserver, we aimed to respect user-centric design principles, open-source, and sustainable software engineering practices (Supplementary Material online). Our software is built using Ruby and Javascript frameworks commonly used for professional software development. The resulting robust architecture and flexibility facilitate customization and integration with other tools. This has led to contributions of improvements and bug-fixes by 21 bioinformaticians unrelated to the initial project; many are now coauthors. Our community is testing the ability to import preexisting BLAST or DIAMOND XML result files (Buchfink et al. 2015), and new manners of visualizing results (Wintersinger and Wasmuth 2015, Cui et al. 2016). Such efforts will continue to improve the ability of researchers to analyze and interpret genomic data.
Data Availability
Source code is available under GNU Affero General Public License (AGPL) 3.0 at https://github.com/wurmlab/sequenceserver (last accessed August 25, 2019). Additional documentation is available online at https://sequenceserver.com (last accessed August 25, 2019).
Supplementary Material
Supplementary materials are available online.
Acknowledgments
We thank the many Sequenceserver users and contributors for their input. During the creation of Sequenceserver, Y.W. was funded by a European Research Council grant to Laurent Keller. B.J.W. was supported by the United States Department of Energy (DE-SC0004632). While writing this manuscript, Y.W. and A.P. were supported by the Biotechnology and Biological Sciences Research Council (BB/K004204/1) and the Natural Environment Research Council (NE/L00626X/1).
References
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990. Basic Local Alignment Search Tool. J Mol Biol. 2153:403–410.
Blanchoud S, Rutherford K, Zondag L, Gemmell NJ, Wilson MJ. 2018. De novo draft assembly of the Botrylloides leachii genome provides further insight into tunicate evolution. Sci Rep. 81:5518.
Buchfink B, Xie C, Huson DH. 2015. Fast and sensitive protein alignment using DIAMOND. Nat Methods. 121:59–60.
Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL. 2009. BLAST+: architecture and applications. BMC Bioinformatics 10:421.
Cui Y, Chen X, Luo H, Fan Z, Luo J, He S, Yue H, Zhang P, Chen R. 2016. BioCircos.js: an interactive Circos JavaScript library for biological data visualization on web applications. Bioinformatics 3211:1740–1742.
Garrett JJ. 2011. The elements of user experience: user-centered design for the Web and beyond. Berkeley (CA): New Riders.
Liew YJ, Aranda M, Voolstra CR. 2016. Reefgenomics.org—a repository for marine genomics data. Database 2016:baw152.
McCormick RF, Truong SK, Mullet JE. 2016. 3D sorghum reconstructions from depth images identify QTL regulating shoot architecture. Plant Physiol. 1722:823–834.
Pracana R, Levantis I, Martínez-Ruiz C, Stolle E, Priyam A, Wurm Y. 2017. Fire ant social chromosomes: differences in number, sequence and expression of odorant binding proteins. Evol Lett. 14:199–210.
Reese W. 2008. Nginx: the high-performance web server and reverse proxy. Linux J. 173:2.
Reichler S, Trmčić A, Martin N, Boor K, Wiedmann M. 2018. Pseudomonas fluorescens group bacterial strains are responsible for repeat and sporadic postpasteurization contamination and reduced fluid milk shelf life. J Dairy Sci. 1019:7780.
Seim I, Jeffery PL, Thomas PB, Nelson CC, Chopin LK. 2017. Whole-genome sequence of the metastatic PC3 and LNCaP human prostate cancer cell lines. G3 (Bethesda) 76:1731–1741.
Shen XX, Zhou X, Kominek J, Kurtzman CP, Hittinger CT, Rokas A. 2016. Reconstructing the backbone of the Saccharomycotina yeast phylogeny using genome-scale data. G3 (Bethesda) 612:3927–3939.
Winnenburg R, Baldwin TK, Urban M, Rawlings C, Köhler J, Hammond-Kosack KE. 2006. PHI-base: a new database for pathogen host interactions. Nucleic Acids Res. 34(Database issue):D459–D464.
Wintersinger JA, Wasmuth JD. 2015. Kablammo: an interactive, web-based blast results visualizer. Bioinformatics 318:1305–1306.