BLASTing Illumina reads in FASTQ format

BLAST uses FASTA format for queries and for database creation. So the BLAST algorithm doesn’t directly understand FASTQ format. This is in part because BLAST was created long before the FASTQ format was created, and because FASTQ files are typically inappropriate for BLAST analysis.

FASTQ files typically are the result of Illumina or Nanopore sequencing. FASTQ files thus include a lot of redundancy (many reads from the same subset of the genome or transcriptome, or from a particualr amplicon. When this is the case:

Most of the time if you want ot BLAST a FASTQ file, you’re probably not using the best approach

It is likely that you want to first reduce redundancy in your dataset. The most biologically relevant way is often to perform whole genome or transcriptome assembly of your raw reads prior to BLASTing them.

If you do want to work with the raw reads, BLAST often isn’t the best way to perform analysis.

But what if I really do need to run BLAST on FASTQ files?

With the aforementioned caveats that it often is inappropriate to BLAST raw reads, gaining biological insight sometimes does depend on it.

The easiest way of converting FASTQ to FASTA format is to use something like the following seqtk command:

seqtk seq -A input.fq > output.fasta

There are many other ways - I recommend using a tried and tested tool rather than creating your own thing by creatively using grep or perl.

By leveraging cloud computing and publication-ready graphics, SequenceServer Cloud makes it easy to perform sequence search results and to interpret them. Learn more

Sequence Search with SequenceServer