Configuring and customizing a SequenceServer BLAST installation for your local PC or compute server

This varies from computer to computer. Run the following command in a terminal to find out:

echo "$(ruby -e 'puts Gem.path[0]')/gems/sequenceserver-1.0.8"

You may need to change 1.0.8 in the above command to reflect the version of SequenceServer you are running.

SequenceServer requires the location of NCBI BLAST+ binaries and the location of database sequences (either in FASTA or BLAST+ database format) to run, and can be specified to SequenceServer using command line parameters or through a configuration file. SequenceServer looks for a configuration file by default at ~/.sequenceserver.conf. This can be changed by using the -c option: sequenceserver -c ~/.sequenceserver.ants.conf.

Configuration files have a simple key-value syntax and can be viewed and modified with standard tools. Alternatively, -s option can be used to add an arbitrary key-value to the configuration file or to change the value of a key:

sequenceserver -c ~/.sequenceserver.ants.conf -s -d /path/to/new/location/of/database/sequences
sequenceserver -s -b /path/to/latest/blast/binaries

The following table lists all configuration values accepted by SequenceServer through the configuration file or through command line options. Command line options take precendence over the values in configuration file.

Configuration file Command line Description
:bin: -b / --bin Indicates path to the BLAST+ binaries.
:database_dir: -d / --database_dir Indicates path to the BLAST+ databases.
:num_threads: -n / --num_threads Number of threads to use for BLAST search.
:host: -H / --host Host to run SequenceServer on.
:port: -p / --port Port to run SequenceServer on.
:require: -r / --require Load extension from this file.

The following table lists additional command line options that are available. We have seen the first two already, and will discuss the rest in following sections.

Command line Description
-c / --config_file Provide path location of your custom configuration file
-s / --set Set configuration value in default or given config file
-m / --make-blast-databases Create BLAST databases
-l / --list-databases List found BLAST databases
-u / --list-unformatted-fastas List unformatted FASTA files
-i / --interactive Run SequenceServer in interactive mode
-D / --devel Run SequenceServer in development (debug) mode
-v / --version Print version number of SequenceServer that will be loaded
-h / --help Display this help message

The BLAST search algorithms don't directly understand FASTA files. BLAST includes the makeblastdb tool that is used to convert FASTA files into the optimized BLASTDB format, which is then used by the search algorithms:

makeblastdb -dbtype <prot_or_nucl> -title <human_readable_name> -in <path_to_fasta> -parse_seqids

SequenceServer can recursively scan a directory for FASTA files, identify whether the file contains nucleotide or amino acid sequences and prompt you to convert them into BLAST databases. It even suggests a suitable name for the BLAST database by cleaning up FASTA file name. SequenceServer automatically does this when it does not find any BLAST database in database_dir. Rest of the times you can/ will need to invoke it manually, e.g., after adding new FASTA files to database_dir.

sequenceserver -m

An alternative directory can be provided:

sequenceserver -m -d /path/to/directory_with_fasta_files
sequenceserver -m -c /path/to/config_file_containing_database_dir

Aroon Chande has put together a script to automatically create BLASTDBs and restart SequenceServer when a FASTA file is added to database directory.

NCBI provides publicly available sequences as pre-formatted BLAST databases and can be downloaded with script distributed with BLAST. Since these databases are huge, they are split across several files (volumes) and linked together with an alias file. SequenceServer works seamlessly with such, multi-part databases. We also have an alternative to to download BLAST databases from NCBI faster: ncbi-blast-dbs.

# Install ncbi-blast-dbs
sudo gem install ncbi-blast-dbs

# View available BLAST databases.

# Download one or more databases.
ncbi-blast-dbs nt nr

Further, SequenceServer understands NCBI sequence ids and automatically links to NCBI page corresponding to the hit sequences from the HTML report.

BLAST can output scientific names, common names, BLAST names, and kingdoms for each hit in tabular output. For this to work, databases should be created with -taxid option of makeblastdb and NCBI "taxdb" must be locatable on your machine by BLAST. This can be helpful when BLAST-ing against several species, using NR database for example. SequenceServer 1.0.4 onwards it is possible to get this taxonomy data in the full tabular report download option.

To download NCBI taxdb, run:

sequenceserver --download-taxdb

If you are using NR database, that's all you need to do. If you are using your own database, you will have to tell SequenceServer "taxid" of the sequences contained in the FASTA file. First remove existing BLAST databases. Then run,

sequenceserver -m

Enter taxid when prompted. You can get the taxid by searching for the species name at NCBI Taxonomy browser. For example,

FASTA file: /Users/priyam/biodb/protein/Solenopsis_invicta/SI2.2.3.fa
FASTA type: protein
Proceed? [y/n] (Default: y):
Enter a database title or will use 'SI 2.2.3 ':
Enter taxid (optional): 13686

With a few exceptions, all command-line BLAST+ parameters can be provided using the "Advanced params" textbox in the search form. Options that change input/output behaviour (e.g., -query, -db, -subject, -outfmt, -import_search_strategy) are not allowed.

For security, only letters, numbers, space, hyphen, underscore, and period are allowed in "Advanced params" textbox.

JBrowse's website has an excellent tutorial in this regard: How can I link BLAST results to JBrowse. The tutorial makes use of SequenceServer's plugin architecture which is described briefly in the next section.

It is often desirable to link search hits to external resources such as NCBI, UniProt, or a genome browser. SequenceServer provides a powerful and flexible mechanism to do this. Simply edit lib/sequenceserver/links.rb in your SequenceServer installation directory to add a link generator function, based on examples and documentation provided in that file. Alternatively, you can write your link generator functions in a separate file and load it through :require_file: key in config file.

You can access methods defined in the Hit class within a link generator. Alignment coordinates are not defined on a hit, but on hsps. Calling hsps method (in link generator) will return an Array of HSP objects for that Hit.

Which database a hit came from is not provide by BLAST in it’s output. You can call out to whichdb method from your link generator to get a list of all databases that the hit could have come from. If your sequences have unique ids across _all_ FASTA files / BLAST databases, you know that the only element in the list is the database that the hit came from. whichdb returns an Array of SequenceServer::Database objects from which you can get database title and path. whichdb is slow. Alternative is to encode db info (a short name) in the sequence id, and use regex matching to decide which database a hit came from.

URL parameters should be encoded. It replaces whitespace and other relevant chars in the string with % encoding followed in URLs.

If your IP is publicly accessible, your colleagues will be able to access your SequenceServer instance at http://your-ip:4567. You can find your IP in the Network or Sharing section of System Preferences. This usually requires being in the same subnetwork, or asking IT services to open your machine to the outside world. You may also want to ask IT services for a fixed IP.

If you already have a fixed, public IP but port 4567 is blocked by a firewall, you can try running SequenceServer on a different port: sequenceserver -p 8080. Administrator privilege is required to use port 80: sudo sequenceserver -p 80.

You can disable sharing by setting :host: key in config file to sequenceserver -s -H

Either put your user account or create a local user account for SequenceServer sudo useradd -s /sbin/nologin seqservuser.

Create file /etc/systemd/system/sequenceserver.service with the following content, changing ExecStart (and maybe User) to match your environment:

Description=SequenceServer server daemon
Documentation="file://sequenceserver --help" ""

ExecStart=/path/to/bin/sequenceserver -c /path/to/sequenceserver.conf


Stop any SequenceServer instance you might be running and check the above works by running the following command:

## let systemd know about changed files
sudo systemctl daemon-reload
## enable service for automatic start on boot
systemctl enable sequenceserver.service
## start service immediately
systemctl start sequenceserver.service

See systemd website for more options and debugging if it fails.

Create file /etc/init/sequenceserver.conf with the following content, changing author and setuid lines to your name and username:

description "Upstart config for SequenceServer"
author "<full name>"

start on filesystem
stop on shutdown

setuid <username>

exec sequenceserver

Stop any SequenceServer instance you might be running and check the above works by running the following command:

sudo start sequenceserver

See Upstart Cookbook for more options and debugging if it fails.

Create file ~/Library/LaunchAgents/sequenceserver.plist with the following content:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "">
<plist version="1.0">
    <true />
    <true />

Stop any SequenceServer instance you might be running and check the above works by running the following command:

launchctl load ~/Library/LaunchAgents/sequenceserver

SequenceServer's built-in webserver can handle medium workloads. Though, for large communities or to integrate SequenceServer as part of existing websites it may be desirable to run SequenceServer with Apache. Also, setting up with Apache means SequenceServer will automatically be available when server restarts.

To setup SequenceServer with Apache, first install Phusion Passengerâ„¢ by following the instructions at their website. Then configure Apache to load SequenceServer by following their guide on deploying a Ruby applicaion, replacing /path-to-your-app with SequenceServer's installation directory. Finally, go to the directory where SequenceServer is installed and edit to indicate absolute path to SequenceServer's config file and DOTDIR which are respectively ~/.sequenceserver.conf and ~/.sequenceserver by default:

# Remove this line.

# And add these two, changing the path.
SequenceServer::DOTDIR = "/home/foo/.sequenceserver"
SequenceServer.init :config_file => "/home/foo/.sequenceserver.conf"

For SequenceServer 1.0.7 and earlier, you will additionally need to delete Gemfile from SequenceServer's installation directory.

If you plan to deploy multiple SequenceServer instances, you should deploy each to a sub-uri.

If you deploy to a sub-uri a trailing slash is required for JS, CSS and the icons to load properly. Ideally, just putting a trailing slash in Apache config should be sufficient. See this thread for more solutions.

Further, because BLAST searches can take time, you may additionally want to configure Timeout in your Apache config to a suitable value (e.g., 5 minutes) so that the Apache doesn't close the connection before a BLAST search has been performed.

In reverse proxy setup, requests are forwarded from Nginx (or Apache) to SequenceServer's built-in server. Following config indicates how to proxy requests from Nginx to SequenceServer from a sub-uri of your domain ( Nginx will timeout requests if it can't connect to SequenceServer within 8 seconds or if it doesn't hear back from SequenceServer within 180 seconds (3 minutes) after it forwarded the request (that is, BLAST requests that take more than than 3 minutes will be timed out by Nginx). Please see Nginx documentation for details info of each directive.

location /sequenceserver/ {
    root /home/priyam/sequenceserver/public/dist;
    proxy_pass http://localhost:4567/;
    proxy_intercept_errors on;
    proxy_connect_timeout 8;
    proxy_read_timeout 180;

SequenceServer can be integrated with Nginx similar to Apache, using Phusion Passenger. And Apache can be used instead of Nginx to proxy connections as well. Whether to use reverse proxy or Phusion Passenger and Apache or Nginx is up to the user. A discussion of pros and cons of each is beyond the scope of this documentation.

If you are using SequenceServer with Apache or Nginx then you can easily password protect your data using HTTP basic authentication scheme. These tutorials from DigitalOcean detail the steps required for both Apache and Nginx.

If you are using SequenceServer without Apache or Nginx, you can still add password protection quite easily. Just add the following snippet at line number 57 in lib/sequenceserver/routes.rb, change the password ('admin') to something more and secure, and restart SequenceServer.

use Rack::Auth::Basic, "Restricted Area" do |username, password|
  username == 'admin' and password == 'admin'

Given SequenceServer simply runs NCBI BLAST+ commands in the shell it's relatively easy to devise a scheme to run BLAST searches on another, more powerful computer or on cluster. For example, by replacing BLAST+ binaries with a "shim" like below, we can run BLAST searches on another computer using SSH.

#!/usr/bin/env sh

blast=`basename $0`
param=`echo "$@" | sed "s/\-db\ /\-db\ \'/" | sed "s/\ \-query\ /\'\ \-query\ /"`

ssh hostname /usr/local/bin/$blast $param

Additionally, TMPDIR environment variable must be set to a directory that's shared between both the machines, e.g., via SSHFS.

Using a job queuing system such as qsub may be a bit involved depending on the flexibility afforded by the system. Fortunately, we have a solution for qsub thanks to Andy Foster. Create the following script:

#!/usr/bin/env sh

jobid=`mktemp bl.XXXX`
rm $jobid


shift 3

param=`echo "$@" | sed "s/\-db\ /\-db\ \'/" | sed "s/\ \-query\ /\'\ \-query\ /"`

qsub -sync y -b y -pe slowpara 4 -N $jobid -o $rfile -e $efile /usr/local/bin/$blast $param

And then modify L67 of lib/sequenceserver/blast.rb to

system("/path/to/script #{rfile.path} #{efile.path} #{command}")

As above, TMPDIR environment variable must be set to a directory that's shared between both the machines, e.g., via a shared file system such as GPFS, NFS mount or SSHFS.

If you are making custom modifications to SequenceServer, following tips may come handy:

SequenceServer's development mode, activated as sequenceserver -D enables verbose logging and loads unbuilt assets (JS and CSS). SequenceServer's interactive command-line mode, activated as sequenceserver -i lets you access all server-side objects and methods, call them and inspect their output in Ruby.

  1. View sequence link is disabled if the length of the hit exceeds 10,000 residues - ok if target sequences are proteins or contigs. We feel this mode of visualising sequences is not optimal for very long sequences (e.g., scaffolds).
  2. Download FASTA of all hits and Download FASTA of selected hits works only for 30 or less hits at a time. This is due to a technical limitation that length of URLs should not exceed 2083 characters. This will be fixed in the next major release.
  3. During setup on some versions of OS X, an extra space is added at the end of autocompleted paths when SequenceServer prompts for paths to the BLAST+ executables or database directory. This appears to be due to a bug in Ruby readline library. Unfortunately it is beyond our scope to fix this slightly inconvenient bug, especially since working around it is straightforward (i.e. you just need to backspace it).
1. Can I use SequenceServer as an access-point for a community genome database?
Yes. SequenceServer is used as data querying mechnism in over 30 community databases. You can use SequenceServer as it is along with supporting pages describing the data and related resources (e.g., HopBase), customise it extensively (e.g., Lotus Base), or integrate it with InterMine (e.g., PlanMine).
2. Does SequenceServer include a genome browser?
No, but any web based genome browser such as JBrowse, Biodalliance, or igv.js can be used. Also see: Integrating with JBrowse and Adding links to search hits.

BLAST is a heuristic, i.e., it is fast and approximate instead of being slow and perfect. It starts by looking for a minimal 100% match (e.g., 11 consecutive nucleotides with 100% identity between your query and the database sequence). If it finds none its over. If it does find a match, it extends that in both directions: identical (or similar) bases add points; differences are negative points. If too many points are lost, it stops aligning. BLAST might not stop at the exact best place, alignment ends might be wrong. bitscore is the total number of points for the aligning region. The bigger it is, the stronger the alignment. But the bitscore doesn't take into account sequence length nor database size. The E-value does take these into account. It is better to look at E-values than bitscores. The E-value represents the number of times the observed alignment would be expected to occur by chance (it is not a p-value!); depends on the bitscore, the length of the query sequence, and the cumulative length of all sequences in the database. It is easier to talk about strong E-values (e.g. 1e-100 = 10-100 = almost zero; impossible to obtain by chance) vs weak E-values (e.g 0.1; for similarity that may be due to chance) than small vs large (which is always a bit confusing).

BLAST has been rewritten several times - most recently by NCBI as BLAST+. NCBI now use and recommend using BLAST+. The BLAST+ publication explains why BLAST+ is easier to use and faster than the old legacy BLAST. WU-BLAST is now commercial and called AB-BLAST. There is probably no good reason to use either alternative. Note that the output formats change slightly from one BLAST implementation to the next. NCBI's BLAST+ is actively developed and is the only one supported by SequenceServer.