Adding sequence tags to PCR primers

There are several ways to incorporate sequence tags to PCR primers. In certain cases, you may just add each sequence tag to the 5’ end of the forward & reverse primers. However, this can introduce unintended secondary structure to each primer (hairpins) or the combination of primers (complementarity), particularly when integrating sequence tags longer than four to five nucleotides.

We provide a program that integrates sequence tags to upper and lower PCR primers using a modified version of primer3 2.2.3 (modified to evaluate longer primers).

Note

To enable this functionality, you must download, build, and install a copy of mod-primer3. For instructions on doing this, see Installing mod-primer3.

If requested, this program will also remove common bases between tags and primers and integrate GTTT pigtails to the 5’ ends of the upper and lower primer to encourage +A addition by Taq polymerase, which may be useful in certain contexts.

Note

As part of the processing undertaken by this program, it will also compute a value, recorded in the values column of the output/database which reports the number of flows required to sequence a particular tag using the 454 platform. We also include this functionality separate in Determining the number of flows per sequence tag.

Finally, this program stores all tagged primers in an sqlite database (and csv output file), with which you can easily sort output primers and from which you can easily select sets of primers meeting certain criteria, including random sets of primers for testing purposes.

You can invoke add_tags_to_primers.py using a command similar to the following:

python add_tags_to_primers.py --input 10_nt_ed_5_tags.txt \
    --left-primer=GTTATGCATGAACGTAATGCTC
    --right-primer=CGCGCATGGTGGATTCACAATCC \
    --output trnH_tagged_with_10_nt_ed_5_tags.csv \
    --sort=pair_hairpin_either,pair_penalty,cycles
    --remove-common --keep-database

Options

--input=FILE
 The input file to check. The format must follow guidelines in Input file formatting.
--section=<string>
 The section of tags to check. Used when an input file contains many sections of tags, and you only want to check one. When the input file contains multiple sections, and you do not pass this option, the program will check all sections. This may cause slow responses.
--left-primer=<string>
 The left primer sequence to tag.
--right-primer=<string>
 The right primer sequence to tag.
--output=<string>
 The path and name of an output file. If you do not pass --keep-database, this will be the path to the output file in CSV format. If you do pass --keep-database, this will become the name of and path to your database, to which the program will append .sqlite.
--pigtail
 Add a “pigtail” to each tagged primer sequence.
--pigtail-sequence=<string>
 The pigtail sequence to add. Defaults to GTTT [Brownstein:1996].
--sort=<string>
 

Comma-separated list of columns on which to sort the contents of --output=<FILE>, if passed as an option. The string should be formatted as above:

--sort=pair_hairpin_either,pair_penalty,cycles

and valid options include one or more of the following:

id, unmodified, tag, cycles, left_tag_common, left_tag, left_sequence,
left_tm, left_gc, left_self_end, left_self_any, left_hairpin,
left_end_stability, left_penalty, left_problems, right_tag_common,
right_tag, right_sequence, right_tm, right_gc, right_self_end,
right_self_any, right_hairpin, right_end_stability, right_penalty,
right_problems, pair_compl_end, pair_compl_any, pair_hairpin_either,
pair_penalty
--remove-common
 Remove common bases btw. pigtail and tag
--keep-database
 Keep the sqlite database produced in the current directory. Useful for sorting and selecting large groups of tagged primers.

Output

If you pass the --output=my_output_file.txt, the result of the run will be saved in a CSV-formatted text file. You can open this text file with many spreadsheet and database programs.

Querying the database

Here follows a (very) brief introduction to sqlite and constructing queries of the output data. For more information, see sqlite.org.

Note

Below, I have used the convention, in Structured Query Language (SQL), of capitalizing statements (e.g. SELECT, ORDER BY, LIMIT, ASC, etc.). This is not required to construct a valid query.

Start sqlite from your command-line interface:

[~] sqlite my_very_first_database.sqlite
SQLite version 3.7.3
Enter ".help" for instructions
Enter SQL statements terminated with a ";"

Look at help, then set some helpful output parameters. Feel free to play around with options here:

sqlite> .help
sqlite> .mode column
sqlite> .headers on
/* see what tables we have */
sqlite> .tables
primers
/* show the columns in `primers` table */
sqlite> .schema primers

Now that we know what columns are in the primers table, we can query data from the database. For instance, get the first 5 primers in the table:

sqlite> SELECT id, tag, left_sequence, right_sequence,
   ...> pair_penalty AS pp FROM primers LIMIT 5;

id          tag         left_sequence           right_sequence        pp
----------  ----------  ----------------------  --------------------  ----------
1                       GTTATGCATGAACGTAATGCTC  CGCATGGTGGATTCACAATC  6.777033
2           TTCTCCTTCA  GTTTCTCCTTCAGTTATGCATG  GTTTCTCCTTCACGCATGGT  41.657069
3           ACCTTACCTT  GTTTACCTTACCTTGTTATGCA  GTTTACCTTACCTTCGCATG  45.328737
4           CATTCCTCTA  GTTTCATTCCTCTAGTTATGCA  GTTTCATTCCTCTACGCATG  45.076019
5           TGTCATTCCT  GTTTGTCATTCCTGTTATGCAT  GTTTGTCATTCCTCGCATGG  44.361076

You may notice the primer having id = 1 has no tag. That is because this is the untagged primer sequence, which we include for the sake of comparison with derived metrics for each tagged primer.

Warning

sqlite will often truncate primer sequences in .mode column because of the default column width settings (.width). You should notice, above, that the values in left_sequence and right_sequence are not the entire primer sequences - they have been truncated. One way to fix this problem is to make sure you run .mode csv before you copy and paste any primer sequences for ordering. Another way to fix that problem is to write the query results to a file, after switching to CSV mode. See below for examples.

Now, let’s get some more primer sequences...

In the first example, we are going to grab two primer sequences (for the sake of minimal output). However, before we grab those two, we are going to:

  1. ignore those primers with hairpins pair_hairpin_either = 0. Primers with potential hairpins are assigned a value of 1 (TRUE) in this column. Primers without hairpins are assigned a 0 (FALSE).
  2. sort on pair_penalty (this is the ORDER BY pair_penalty ASC portion of the query). pair_penalty is a bad thing, and as the value for this column gets higher, the primers are “worse”. So, we want primers with the lowest pair_penalty possible - thus we sort on this column, and grab those primers within minimal values for pair_penalty.

So, select 2 primer sequences from table where there are no hairpins and with the lowest total penalties (i,e. from best to worst):

sqlite> SELECT id, tag, left_sequence, right_sequence FROM primers WHERE
   ...> pair_hairpin_either = 0 ORDER BY pair_penalty ASC LIMIT 2;

id          tag         left_sequence                         right_sequence
----------  ----------  ------------------------------------  ---------------------------------
35          CCATATGAAC  GTTTCCATATGAACGTTATGCATGAACGTAATGCTC  GTTTCCATATGAACGCATGGTGGATTCACAATC
36          CGGAACTTAT  GTTTCGGAACTTATGTTATGCATGAACGTAATGCTC  GTTTCGGAACTTATCGCATGGTGGATTCACAAT

Now, we’re just going to grab some random primers that do not have hairpins for testing. After testing, we may remove the ORDER BY RANDOM() LIMIT 5 portion of the query to grab all those primers with no hairpins (e.g. for ordering):

/* select a random set of 5 primers having no hairpins */

sqlite> SELECT id, tag, left_sequence, right_sequence FROM primers WHERE
   ...> pair_hairpin_either = 0 ORDER BY RANDOM() LIMIT 5;

id          tag         left_sequence                         right_sequence
----------  ----------  ------------------------------------  ---------------------------------
35          CCATATGAAC  GTTTCCATATGAACGTTATGCATGAACGTAATGCTC  GTTTCCATATGAACGCATGGTGGATTCACAATC
147         CCGGTGGAAT  GTTTCCGGTGGAATGTTATGCATGAACGTAATGCTC  GTTTCCGGTGGAATCGCATGGTGGATTCACAAT
146         CCGAACAGTG  GTTTCCGAACAGTGTTATGCATGAACGTAATGCTC   GTTTCCGAACAGTGCGCATGGTGGATTCACAAT
151         GGAAGACCTC  GTTTGGAAGACCTCGTTATGCATGAACGTAATGCTC  GTTTGGAAGACCTCGCATGGTGGATTCACAATC
36          CGGAACTTAT  GTTTCGGAACTTATGTTATGCATGAACGTAATGCTC  GTTTCGGAACTTATCGCATGGTGGATTCACAAT

/* Before we order these primers for testing, ensure we have no truncation issues.
Set mode to CSV, and re-run query before copying and pasting */

sqlite> .mode csv
sqlite> SELECT id, tag, left_sequence, right_sequence FROM primers WHERE
   ...> pair_hairpin_either = 0 ORDER BY RANDOM() LIMIT 5;

id,tag,left_sequence,right_sequence
133,GCCTTCAGGA,GTTTGCCTTCAGGAGTTATGCATGAACGTAATGCTC,GTTTGCCTTCAGGACGCATGGTGGATTCACAATC
36,CGGAACTTAT,GTTTCGGAACTTATGTTATGCATGAACGTAATGCTC,GTTTCGGAACTTATCGCATGGTGGATTCACAATC
147,CCGGTGGAAT,GTTTCCGGTGGAATGTTATGCATGAACGTAATGCTC,GTTTCCGGTGGAATCGCATGGTGGATTCACAATC
130,CGTCAAGAAG,GTTTCGTCAAGAAGTTATGCATGAACGTAATGCTC,GTTTCGTCAAGAAGCGCATGGTGGATTCACAATC
146,CCGAACAGTG,GTTTCCGAACAGTGTTATGCATGAACGTAATGCTC,GTTTCCGAACAGTGCGCATGGTGGATTCACAATC

/* Or, save these query results to a file */

sqlite> .mode csv
sqlite> .output my_first_output_file.csv
sqlite> SELECT id, tag, left_sequence, right_sequence FROM primers WHERE
   ...> pair_hairpin_either = 0 ORDER BY RANDOM() LIMIT 5;
sqlite> .quit

Table Of Contents

Related Topics

This Page