Check PCR primer specificity by BLASTing your primers
Five key points to keep in mind
Non-specific amplification is where regions other than the designated region are amplified. A common challenge when designing primers is to avoid non-specific amplification. This is because non-specific amplification can lead to false positives in downstream analyses e.g., if you are trying to amplify a gene of interest, but your primer also amplifies a different gene, then you could get a false positive result.
Similarly, you want to avoid even one primer binding to two locations in the genome. This is because mis-priming of one primer can lead to lower efficiency of your PCR reaction, and in extreme cases to false negatives.
And if you’re thinking about multiplexing, having primers that match multiple locations can lead to false positives and to false negatives.
Common primer design software such as Primer3 will check annealing and melting temperatures and avoid primer-dimers, but they do not check for non-specific amplification.
However, BLAST is a powerful tool used to compare sequences and search for similarities. It can be used to check the specificity of PCR primers. But there are five key points to keep in mind.
- Use shorter word-size to increase sensitivity
- Search the entire genome: Avoid filtering
- Be specific in the scope of your search
- Check BLAST hit coordinates
- Changing how BLAST tallies up scores
Let’s dive deeper into each of these points, before summarizing the BLAST parameters you should use in the command-line or in a graphical BLAST search tool like SequenceServer.
1. Use shorter word-size to increase sensitivity for primer BLAST
The word-size ( -word_size
) in BLAST are the minimum number of consecutive character matches that are required in an alignment. By default, blastn
searches with a word-size of 11 or even 28. This means that blastn
will only detect sequence similarity if there are at least 11 (or even 28) nucleotides of perfect identity. That would be inappropriate for checking primer specificity, because primers are typically 20 nucleotides long, and even partial matches can create undersirable mis-priming. So rather than using the default parameters, we need to specify -task blastn-short
. This decreases the word-size to 7. Thus it increases BLAST’s ability to detect partial matches i.e., short alignments with mis-matches.
2. Search the entire genome: Avoid BLAST filtering of repetitive regions
You also want to be searching the entire genome. But because of filtering, that normally doesn’t happen. Thus, you want to switch off filtering of lower confidence or lower complexity regions. This is because part of your primer may sometimes hit a repetitive or ambiguous sequence.
To switch off filtering, specificy -dust no -soft_masking false
. Soft-masking ignores lowercase letters in the genome, and dust is a filter that removes highly repetitive sequences such as microsatellites or minisatellites.
3. Be specific by reducing the scope of the BLAST database you are searching with
BLAST is more sensitive if it is searching against the most appropriate database. This is because smaller databases lead to stronger (i.e., smaller) E-values.
So reduce the scope of your search. In most cases, this means searching against only your organism of interest, rather than a multi-genome database or something like “all of RefSeq” or “all of NCBI”. But if your DNA sample contains DNA from multiple organisms, as happens in symbiosis or in microbiomes, then you should use the search against that correct combination of relevant genomes.
4. Check coordinates of primer BLAST hits
Ideally, you get a single hit per primer. But if you get multiple, check their coordinates. For this, it can be helpful to download the table of all hits into excel. Make sure you also check hit orientation in comparison to query orientation - this is because BLAST doesn’t necessarily align from the first to the last nucleotide (i.e., the hit may focus on the middle part of the sequence that does match perfectly).
You should also check the distance between the extremities of the coordinates should be the size of the expected PCR product. And primer pairs want to be oriented correctly, and not too far apart (e.g., ideally under 1000 nucleotides)… if spanning too much of the genome, your PCR will likely fail.
For eukaryotes we need to be careful about the specific location when designing primers for genomic versus genic regions.
Genomic amplification from Genomic DNA
If we are interested in the amplifying a genomic region from genomic DNA (gDNA) then things are relatively simple. As mentioned we mostly need to consider:
- The distance of the primer pairs.
- Select and appropriate polymerase.
- Avoid repeats. Primers that target repetative genomic sequences will be able to bind in many different locations. This would lead to the amplification of many regions of the genome. Additionally, if your primers flank a repetitive region you may not successfully amplify the desired sequence. This is because repetitive sequences induce DNA polymerase slippage and stalling, which can lead to length and sequence variation.
Genic amplification from Genomic DNA
If you’re amplifying genic regions using gDNA from a eukaryote, you need to be careful of the exon-intron of a gene.
- Avoid exon-exon junctions. If your primer overlaps a splice junction, then it will struggle to anneal to gDNA as the primer will only half match to the end and start of the two exons.
- Avoid introns. Unless an intron is very small, you likely want to avoid spanning multiple exons with intervening introns. Some introns span >100,000 nucleotides… Taq polymerase won’t be able to handle that during PCR!
- Target single exons. Design the location of your primers to amplify the largest exon of your gene.
Genic amplification from coding DNA (cDNA)
Genic (i.e. from a gene) amplification from cDNA has many advantages compared to using gDNA. For this you need to extract RNA and do reverse transcription to get cDNA (i.e., RT-PCR or quantitative RT-PCR) prior to PCR amplification. The resulting cDNA library will contain mRNA transcripts that only contain exonic regions.
- For isoform detection and reduced gDNA contaimination amplification. Do design your primers so that they span exon-exon junctions. This should lead them to only amplify cDNA, but not any gDNA that might not have been fully removed during RNA extraction. Having multiple sets of such exon-exon-spanning primers can help you identify exon-skipping events, and which splice-form is expressed in which tissue.
- Design primers towards the very ends of the gene. Since most genes small enough, you can amplify the whole coding domain sequence (CDS) with one PCR. This can be very useful if you want to express the gene in downstream approaches. This can also be useful for Rapid Amplification of cDNA Ends (RACE) and designing specific in situ hybridisation probes.
5. Changing how BLAST tallies up scores
For primer alignment searches, we can also modify the way BLAST calculates scores. Mis-matches in primer binding can severely reduce annealing. Therefore we can be fairly strict in the way BLAST will score mis-matches. We select the reward for matches (-reward 1
) and mis-matches (-penalty -3
) so that mis-matches are relatively penalised. Same thing for how gaps (-gapopen 5
) and the size of gaps (-gapextend 2
) are penalised. Adjusting these parameters means we are customizing the search to look for strict primer alignments.
Overall, use the following options for BLASTing primer sequences:
To pull everything together, use the following BLASTN options for checking PCR primers:
-task blastn-short -dust no -soft_masking false -penalty -3 -reward 1 -gapopen 5 -gapextend 2
In the command line, you’d add those arguments to the end of the BLASTN command.
In SequenceServer, you can just select those options from the “Advanced Parameters” drop-down menu.
Going one step further by BLASTing both primers together
Above, we discussed the BLAST parameters that work great for short oligonucleotide sequences like primers (but also Illumina adapters or CRISPR-Cas9 gRNAs.
Some off-target hybridization of your primers to your genome may occur. But that really will only be problematic if both primers hybridize to the same off-target sequence. What are the odds? Well if you’re looking at recent gene duplicates, it is actually quite likely!
So you can go one step further by BLASTing both primers together. For that: concatenate the two primers together, separated by a few “NNN” nucleotides, then hit BLAST. You should get an overview such as the following:
Here, a single genomic segment (the two black rectangles connected by a thin line) aligns to the contactenated primer. Each primer also has a weaker alignment to a different segment (in pink). Scrolling further down allows us to see exactly where and how the primers align to the genome:
As well as weaker secondary alignemnts (only one shown):
SequenceServer Cloud makes it easy to perform sequence search results and to interpret them. For this, it leverages cloud computing, publication-ready graphics that facilitate interpretation , and a powerful graphical interface for configuring BLAST databases .