.. _using_primers: *********************************** Adding sequence tags to PCR primers *********************************** There are several ways to incorporate sequence tags to PCR primers. In certain cases, you may just add each sequence tag to the 5' end of the forward & reverse primers. However, this can introduce unintended secondary structure to each primer (hairpins) or the combination of primers (complementarity), particularly when integrating sequence tags longer than four to five nucleotides. We provide a program that integrates sequence tags to upper and lower PCR primers using a `modified version of primer3 2.2.3 `_ (modified to evaluate longer primers). .. note:: To enable this functionality, you must download, build, and install a copy of ``mod-primer3``. For instructions on doing this, see :ref:`mod-primer_installation`. If requested, this program will also remove common bases between tags and primers and integrate ``GTTT`` pigtails to the 5' ends of the upper and lower primer to encourage ``+A`` addition by *Taq* polymerase, which may be useful in certain contexts. .. note:: As part of the processing undertaken by this program, it will also compute a value, recorded in the ``values`` column of the output/database which reports the number of flows required to sequence a particular tag using the 454 platform. We also include this functionality separate in :ref:`flows`. Finally, this program stores all tagged primers in an `sqlite `_ database (and csv output file), with which you can easily sort output primers and from which you can easily select sets of primers meeting certain criteria, including random sets of primers for testing purposes. You can invoke ``add_tags_to_primers.py`` using a command similar to the following: .. code-block:: bash python add_tags_to_primers.py --input 10_nt_ed_5_tags.txt \ --left-primer=GTTATGCATGAACGTAATGCTC --right-primer=CGCGCATGGTGGATTCACAATCC \ --output trnH_tagged_with_10_nt_ed_5_tags.csv \ --sort=pair_hairpin_either,pair_penalty,cycles --remove-common --keep-database Options ******* --input=FILE The input file to check. The format must follow guidelines in :ref:`formatting`. --section= The section of tags to check. Used when an input file contains many sections of tags, and you only want to check one. When the input file contains multiple sections, and you do not pass this option, the program will check all sections. This may cause slow responses. --left-primer= The left primer sequence to tag. --right-primer= The right primer sequence to tag. --output= The path and name of an output file. If you do not pass ``--keep-database``, this will be the path to the output file in CSV format. If you do pass ``--keep-database``, this will become the name of and path to your database, to which the program will append ``.sqlite``. --pigtail Add a "pigtail" to each tagged primer sequence. --pigtail-sequence= The pigtail sequence to add. Defaults to ``GTTT`` [Brownstein:1996]_. --sort= Comma-separated list of columns on which to sort the contents of ``--output=``, if passed as an option. The string should be formatted as above:: --sort=pair_hairpin_either,pair_penalty,cycles and valid options include one or more of the following:: id, unmodified, tag, cycles, left_tag_common, left_tag, left_sequence, left_tm, left_gc, left_self_end, left_self_any, left_hairpin, left_end_stability, left_penalty, left_problems, right_tag_common, right_tag, right_sequence, right_tm, right_gc, right_self_end, right_self_any, right_hairpin, right_end_stability, right_penalty, right_problems, pair_compl_end, pair_compl_any, pair_hairpin_either, pair_penalty --remove-common Remove common bases btw. pigtail and tag --keep-database Keep the sqlite database produced in the current directory. Useful for sorting and selecting large groups of tagged primers. Output ****** If you pass the ``--output=my_output_file.txt``, the result of the run will be saved in a CSV-formatted text file. You can open this text file with many spreadsheet and database programs. Querying the database ********************* Here follows a (very) brief introduction to `sqlite `_ and constructing queries of the output data. For more information, see `sqlite.org `_. .. note:: Below, I have used the convention, in `Structured Query Language (SQL) `_, of capitalizing statements (e.g. SELECT, ORDER BY, LIMIT, ASC, etc.). This is **not** required to construct a valid query. Start sqlite from your command-line interface: .. code-block:: bash [~] sqlite my_very_first_database.sqlite SQLite version 3.7.3 Enter ".help" for instructions Enter SQL statements terminated with a ";" Look at help, then set some helpful output parameters. Feel free to play around with options here: .. code-block:: sql sqlite> .help sqlite> .mode column sqlite> .headers on /* see what tables we have */ sqlite> .tables primers /* show the columns in `primers` table */ sqlite> .schema primers Now that we know what columns are in the ``primers`` table, we can query data from the database. For instance, get the first 5 primers in the table: .. code-block:: sql sqlite> SELECT id, tag, left_sequence, right_sequence, ...> pair_penalty AS pp FROM primers LIMIT 5; id tag left_sequence right_sequence pp ---------- ---------- ---------------------- -------------------- ---------- 1 GTTATGCATGAACGTAATGCTC CGCATGGTGGATTCACAATC 6.777033 2 TTCTCCTTCA GTTTCTCCTTCAGTTATGCATG GTTTCTCCTTCACGCATGGT 41.657069 3 ACCTTACCTT GTTTACCTTACCTTGTTATGCA GTTTACCTTACCTTCGCATG 45.328737 4 CATTCCTCTA GTTTCATTCCTCTAGTTATGCA GTTTCATTCCTCTACGCATG 45.076019 5 TGTCATTCCT GTTTGTCATTCCTGTTATGCAT GTTTGTCATTCCTCGCATGG 44.361076 You may notice the primer having ``id = 1`` has no tag. That is because this is the *untagged* primer sequence, which we include for the sake of comparison with derived metrics for each tagged primer. .. warning:: sqlite will often truncate primer sequences in ``.mode column`` because of the default column width settings (``.width``). You should notice, above, that the values in ``left_sequence`` and ``right_sequence`` are **not** the entire primer sequences - *they have been truncated*. One way to fix this problem is to make sure you run ``.mode csv`` before you copy and paste **any** primer sequences for ordering. Another way to fix that problem is to write the query results to a file, after switching to CSV mode. See below for examples. Now, let's get some more primer sequences... In the first example, we are going to grab two primer sequences (for the sake of minimal output). However, before we grab those two, we are going to: 1. ignore those primers with hairpins ``pair_hairpin_either = 0``. Primers with potential hairpins are assigned a value of ``1`` (TRUE) in this column. Primers without hairpins are assigned a ``0`` (FALSE). 2. sort on ``pair_penalty`` (this is the ``ORDER BY pair_penalty ASC`` portion of the query). ``pair_penalty`` is a bad thing, and as the value for this column gets higher, the primers are "worse". So, we want primers with the lowest ``pair_penalty`` possible - thus we sort on this column, and grab those primers within minimal values for ``pair_penalty``. So, select 2 primer sequences from table where there are no hairpins and with the lowest total penalties (i,e. from best to worst): .. code-block:: sql sqlite> SELECT id, tag, left_sequence, right_sequence FROM primers WHERE ...> pair_hairpin_either = 0 ORDER BY pair_penalty ASC LIMIT 2; id tag left_sequence right_sequence ---------- ---------- ------------------------------------ --------------------------------- 35 CCATATGAAC GTTTCCATATGAACGTTATGCATGAACGTAATGCTC GTTTCCATATGAACGCATGGTGGATTCACAATC 36 CGGAACTTAT GTTTCGGAACTTATGTTATGCATGAACGTAATGCTC GTTTCGGAACTTATCGCATGGTGGATTCACAAT Now, we're just going to grab some random primers that do not have hairpins for testing. After testing, we may remove the ``ORDER BY RANDOM() LIMIT 5`` portion of the query to grab all those primers with no hairpins (e.g. for ordering): .. code-block:: sql /* select a random set of 5 primers having no hairpins */ sqlite> SELECT id, tag, left_sequence, right_sequence FROM primers WHERE ...> pair_hairpin_either = 0 ORDER BY RANDOM() LIMIT 5; id tag left_sequence right_sequence ---------- ---------- ------------------------------------ --------------------------------- 35 CCATATGAAC GTTTCCATATGAACGTTATGCATGAACGTAATGCTC GTTTCCATATGAACGCATGGTGGATTCACAATC 147 CCGGTGGAAT GTTTCCGGTGGAATGTTATGCATGAACGTAATGCTC GTTTCCGGTGGAATCGCATGGTGGATTCACAAT 146 CCGAACAGTG GTTTCCGAACAGTGTTATGCATGAACGTAATGCTC GTTTCCGAACAGTGCGCATGGTGGATTCACAAT 151 GGAAGACCTC GTTTGGAAGACCTCGTTATGCATGAACGTAATGCTC GTTTGGAAGACCTCGCATGGTGGATTCACAATC 36 CGGAACTTAT GTTTCGGAACTTATGTTATGCATGAACGTAATGCTC GTTTCGGAACTTATCGCATGGTGGATTCACAAT /* Before we order these primers for testing, ensure we have no truncation issues. Set mode to CSV, and re-run query before copying and pasting */ sqlite> .mode csv sqlite> SELECT id, tag, left_sequence, right_sequence FROM primers WHERE ...> pair_hairpin_either = 0 ORDER BY RANDOM() LIMIT 5; id,tag,left_sequence,right_sequence 133,GCCTTCAGGA,GTTTGCCTTCAGGAGTTATGCATGAACGTAATGCTC,GTTTGCCTTCAGGACGCATGGTGGATTCACAATC 36,CGGAACTTAT,GTTTCGGAACTTATGTTATGCATGAACGTAATGCTC,GTTTCGGAACTTATCGCATGGTGGATTCACAATC 147,CCGGTGGAAT,GTTTCCGGTGGAATGTTATGCATGAACGTAATGCTC,GTTTCCGGTGGAATCGCATGGTGGATTCACAATC 130,CGTCAAGAAG,GTTTCGTCAAGAAGTTATGCATGAACGTAATGCTC,GTTTCGTCAAGAAGCGCATGGTGGATTCACAATC 146,CCGAACAGTG,GTTTCCGAACAGTGTTATGCATGAACGTAATGCTC,GTTTCCGAACAGTGCGCATGGTGGATTCACAATC /* Or, save these query results to a file */ sqlite> .mode csv sqlite> .output my_first_output_file.csv sqlite> SELECT id, tag, left_sequence, right_sequence FROM primers WHERE ...> pair_hairpin_either = 0 ORDER BY RANDOM() LIMIT 5; sqlite> .quit