Citation Guidelines for BLAST Software
BLAST (Basic Local Alignment Search Tool) underpins a huge amount of biological research. Just like any other tool or resource, it is important to cite BLAST and its related software and data sources in your publications.
Do also remember to indicate the version of any software you used, as well as the versions of the datasets you used. The results you obtain with the latest version may be different from those obtained with an older version.
For example, the top of a SequenceServer BLAST report precisely indicates versions:
Citing the NCBI BLAST algorithm
BLAST has evolved through dozens of publications over the years, culminating in the BLAST+ algorithm. While there are many important papers, if you’re using command-line BLAST or the NCBI website, it’s most important to cite current BLAST+ suite:
- Camacho, C., Coulouris, G., Avagyan, V., Ma, N., Papadopoulos, J., Bealer, K., and Madden, T.L. 2009. BLAST+: architecture and applications. BMC Bioinformatics, 10, 421.
They have been most recently maintaining and improving the BLAST algorithm, including to ensure that it continues to perform on ever-growing datasets.
Citing the SequenceServer BLAST graphical interface
Many use SequenceServer’s graphical interface to run NCBI BLAST searches on custom datasets, or just to get results faster. To cite SequenceServer, use the following reference:
- Priyam, A., Woodcroft, B.J., Rai, V., Moghul, I., Munagala, A., Ter, F., Chowdhary, H., Pieniak, I., Maynard, L.J., Gibbins, M.A., Moon, H., Davis-Richardson, A., Uludag, M., Watson-Haigh, N.S., Challis, R., Nakamura, H., Favreau, E., Gómez, E.A., Pluskal, T., Leonard, G., Rumpf W., and Wurm, Y. 2019. SequenceServer: a modern graphical user interface for custom BLAST databases. Molecular Biology and Evolution, 36(12), pp.2922-2924.
Citing SequenceServer SRA BLAST
The SequenceServer BLAST SRA application is a powerful way to search SRA data. To cite the SequenceServer SRA BLAST app use the following reference:
- Pragmatic Genomics (unpublished). SequenceServer SRA BLAST: highly effective and efficient mining of next generation sequencing reads from RNAseq, genomic, and metagenomic datasets.
Citing Data Sources
If you didn’t generate it yourself as part of your paper, it’s also crucial to cite the your data sources. Here are examples of how to cite common data sources:
-
Benson, D.A., Cavanaugh, M., Clark, K., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J. and Sayers, E.W. 2012. GenBank. Nucleic Acids Research, 41(D1), pp.D36-D42.
-
The UniProt Consortium, 2021. UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Research, 49(D1), D480-D489.
Why cite tools or data sources?
You may wonder why it’s necessary to cite tools and data sources. After all, they’re just resources, and aren’t biological knowledge. The truth is that tools and data sources are just as important as the research itself. Reasons for citing data sources and tools include:
-
Recognition of Developers’ Efforts: Great science needs great software. Citing analysis tools acknowledges the hard work and expertise of the developers who created and maintain these tools. It’s a form of respect and recognition for their contribution to the scientific community. Many research software engineers are not given the same recognition as researchers, and this is a way to help change that.
-
Supports Further Development: Citing tools helps track their usage, which helps those making the tools to secure funding or support for further development and maintenance. It also helps them to understand how their tools are being used, and thus how to improve them.
-
Scientific Integrity: Citations maintain the integrity of the scientific process. They provide a transparent trail of methods and techniques used in research, and thus also supports peer reviewers understand what you did.
-
Promotes Discovery and Innovation: By citing tools, researchers can inform others about useful resources, potentially leading to new discoveries and innovations in the field.
-
Builds a Network of Knowledge: Citations link research together, creating a network of knowledge that can be easily navigated and built upon by future researchers.
Some historical BLAST references
BLAST has been around for a while - and it has overall been cited more than 100,000 times! You may see some publications also cite older papers. Here are some of the most important ones:
One of the foundational papers for BLAST, where the basic method and its utility in sequence comparison were first introduced:
- Altschul, S.F., Gish, W., Miller, W., Myers, E.W. and Lipman, D.J., 1990. Basic local alignment search tool. Journal of Molecular Biology, 215(3), pp.403-410.
A paper introducing significant improvements to BLAST, such as Gapped BLAST and PSI-BLAST, enhancing the tool’s accuracy and speed:
- Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W. and Lipman, D.J., 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research, 25(17), pp.3389-3402.
A paper discussing how sequence alignments should be scored - a crucial for the BLAST algorithm:
- Altschul, S.F., Boguski, M.S., Gish, W. and Wootton, J.C., 1994. Issues in searching molecular sequence databases. Nature Genetics, 6(2), pp.119-129.