Sunday, 2 December 2012

Library preparation for RNA-seq

It is often said that RNA-seq is overtaking microarray as the gold-standard for transcriptome wide gene expression analysis, but what most people don't understand is, that this "gold standard" is far from standard. There are an ever increasing number of methods available to apply high throughput sequencing to gene expression analysis. Thus "RNA-Seq" is actually very generic term describing a range of techniques which aim to use sequencing to profile transcripts. Let's go through some of the more common applications  and work our way to the more niche applications.


This method aims to profile the pool of transcripts which encode for proteins. The idea is to enrich the RNA mixture for mRNAs by depleting the concentration of rRNAs and other abundant ncRNAs. Usually, this is achieved by polyA enrichment using hybridization to magnetic beads decorated with oligo dT strands. After the enrichment the resulting RNA is fragmented by heat exposure in the presence of divalent cations and then subjected to reverse transcription using a random hexamer primer followed by synthesis of the second strand cDNA. This process is then followed by the standard protocol for DNA library prep that I mentioned in a previous post (End Repair, A-tailing, adaptor ligation, size selection and amplification). The benefit of using this method is that it does reduce rRNA by about 90%, but it also depletes for ncRNAs which don't have polyA tails. It is also a fairly long procedure at 2 days. It requires about 1ug of input RNA.

Total RNA-Seq

As the name suggests, the aim is to profile the entire pool of RNAs in a sample (excluding small RNAs), which is done simply by omitting the polyA enrichment step of the mRNA-Seq method. Otherwise, all other steps from the reverse transcription onwards are the same. It is about half a day quicker than mRNA-Seq method and only requires about 100ng of input RNA. This is the method of choice for profiling prokaryote transcriptomes. The random priming method generally results in the depletion of small RNAs so the name "Total RNA-Seq" is kinda wrong.

rRNA depletion methods

These methods start with a total RNA library prep and then incorporate a step which allows specific depletion of highly abundant sequences such as ribosomal RNAs and others. Kits using this type approach include the Ribo-minus and Ribo-zero. Double stranded nuclease (DSN) normalisation was also used to perform this depletion, but seems to have fallen out of favour in recent times. rRNA depletion methods are ideal for situations when you have less than the 1ug required for the poly-A enrichment method. These methods also allow the deep analysis of longer ncRNAs which might not have polyA tails.

Directional/Strand Specific RNA-Seq

Capturing strand direction is important in differentiating expression of nearby and overlapping transcripts which is important at many loci. There have been a few strand specific methods developed, including those which perform 3' and 5' ligation of RNA oligos followed by reverse transcription and PCR to generate the dscDNA library in a way similar to the small RNA method below (link). Newer methods include one in which the ligated RNA product is reverse transcribed on the flowcell (link), one method called di-tagging uses a 5' tagged random hexamer sequence for first strand synthesis, followed by integration of a specific 3' sequence using a so-called terminal tagging oligo which allows for subsequent PCR amplification from very low starting material amounts (link). The Illumina Truseq method utilises a 5' tagged random hexamer for reverse transcription followed by standard second strand synthesis and library prep. Strand specificity is provided by the 5' tag incorporated in the first strand synthesis. These methods are slowly becoming mainstream as kits have been released by Illumina and NEB.


Cap analysis gene expression is a tool which allows the capture of 5' ends of transcripts for library prep and sequencing (linklink). Capturing the 5' cap allows for high resolution profiling of transcription start sites and significantly streamlines the bioinformatics analysis of differential gene expression. The RNA is reverse transcribed and the single strand cDNA is ligated to a specially designed biotinylated double stranded adapter which allows subsequent synthesis of the second strand. The 5' caps are cleaved from the rest of the cDNA by a class II restriction enzyme, commonly MmeI. The resulting DNA fragments can be prepared for sequencing using standard adapter ligation and amplification, with extra care taken in the size selection step to isolate the molecules containing the 32nt cap. CAGE-Seq has its uses, especially in resolving the question of alternative transcriptional start sites, but otherwise has not been widely adopted.

Single molecule methods/Ultra low input

These methods generally begin with minute amounts of RNA, for instance from a FACS/MACS run or from a laser-captured microdissection or even single cells. Some of these methods include Ovation RNA Amplification System V2 from NuGEN and the Clontech SMARTer Ultra Low RNA Kit. The di-tagging method mentioned above is also suitable on the 100-500pg range. A note of warning here that any method which involves amplification of RNA with many rounds of PCR or rolling-circle amplification could allow the misincorporation of bases at higher levels. (This happened in our lab with a kit that will remain nameless and resulted in an error rate of 2% and subsequent mis-alignment of our 36nt reads all over the genome. We thought initially that somehow DNA had contaminated our starting material but it was all due to the increased error rates. When we finally found the problem, we eliminated all reads with any mismatches to the genome (over 85% unfortunately) and like magic, those remaining reads began to make perfect sense.) Thus, I would suggest that when you begin with Ultra-low analysis, that you increase read lengths and thoroughly compare error rates to more standard mRNA-seq procedures.


A niche method which begins with the immunoprecipitation of RNA-binding proteins followed by the method described for total RNA-seq or directional RNA-seq. We have adapted this technique in our lab for analysis  of RNA which is bound to the chromatin and allows us to look at certain types of chromatin, for instance those regions which have "repressive" or "active" histone marks. Applied specifically to AGO proteins, this allows the high resolution mapping of mRNAs being targeted by AGO at any point in time in a method called CLIP-Seq or HITS-CLIP (link). When applied specifically to ribosomes in a method called Ribo-Seq, it reveals an exquisite level of resolution of the rate or translation of each transcript (link). IP of adenosine bases methylated at the N6 position followed by sequencing allowed the resolution of a transcriptome wide map of this novel modification (link).

Small RNA-Seq

Starting with total RNA (optionally microRNA-enriched with miRvana), the pool of RNAs are subjected to 3' RNA adapter ligation followed by 5' adapter ligation, reverse transcription, PCR amplification and size selection for the fragment size range of interest. The adapters are modified to enhance the incorporation of microRNAs over other types of RNA. The barcode can be incorporated in the adapter sequence or later in the PCR primer. Early-release kits had big issues with barcode specific bias which was tracked down to the sequence specificity of RNA ligases. These biases are largely nullified in newer kits by placing the barcode at the distal end of the universal adapter sequence. Nevertheless, I would highly recommend that you perform a quality control test by analysing one biological sample with all available barcodes. We have been using the Illumina TruSeq small RNA kits (QC was fine) and will soon be testing the NEB small RNA kit. These small RNA kits also capture microRNAs, but also other sequenced such as piRNAs, Y-RNAs, tRNAs and to a lesser extent mRNAs which could originate from nacent transcripts or degradation products.

Bisulfite RNA-Seq

Some cytosines in RNA are methylated, commonly in rRNAs, tRNAs and miRNAs. The bisulfite sequencing method used to analyse methylated cytosine in DNA has been adapted to RNA. This has been used to interrogate specific transcripts (i.e, tRNAs, rRNAs; link) and is applicable on a genome-wide basis.

In summary, these methods each have their strengths and weaknesses. There is no method which will capture the complete complexity of RNAs found in the cell, but I hope this post will help you on the way to selecting the right tools for the job at hand.

Further reading:
Perspective from Illumina