Sunday, 21 October 2012

Paper of the week - explaining the stability of ncRNAs in the cell

3' Polyadenylation is a key mechanism whereby mRNAs are stabilised and made ready for protein translation. One of the mysteries of molecular biology of late is how long non coding RNA (lncRNA) stability is achieved given that these molecules don't have a polyadenylation signal. Wilusz et al published a paper in Genes and Development predicting that MALAT1 is protected from 3' to 5' exonuclease activity by an RNA triple helix structure. The researchers used molecular modeling to resolve that the 3' terminus is neatly tied into a triple helix and thus likely protected from degradation. This was confirmed by mutagenesis, showing that altering bases in these regions led to a reduction in transcript stability.

MALAT1 is transcribed to form a ~6.7 kb lncRNA which is abundant in the nucleus, and also producing a small tRNA like transcript which is processed into a mature 61 nt hairpin localised to the cytosol. Both transcripts are dependant on RNase P, a ribozyme which is commonly known to be involved in tRNA processing. So it seems that MALAT1 uses features of mRNA processing (such as RNA POLII) and other features from the tRNA pathway (RNAse P, RNAse Z and CCA adding enzyme).

Surprisingly, the authors show that when they placed a triple helix motif downstream of a coding sequence, that translation of the coding sequence was enhanced, as was the overall transcript stability from exonucleases. The authors also found that regardless of whether an mRNA has a classic polyA tail or a MALAT1 like triple helix tail, that microRNAs targeted both types of transcripts to a similar extent, around 3 fold repression at the protein level.

As exemplified in data presented by ENCODE researchers, our genome is rich with with non coding RNA genes which we know very little. MALAT1 is one of the most abundant of such RNAs in the cell and its over-expression has been linked to invasiveness of cancers. It's localisation to the nuclear speckles indicates a role in RNA splicing and Ribo-seq experiments have previously shown MALAT1 RNA to be partially "protected" by ribosomes. We think that MALAT1 and other lncRNAs are just as important in the cell and relevant to disease states than genes which encode regulatory proteins like p53 and NF-κB. Time will tell.

See the original paper:

Genes Dev. 2012 Oct 16. [Epub ahead of print]A triple helix stabilizes the 3' ends of long noncoding RNAs that lack poly(A) tails.Wilusz JE, Jnbaptiste CK, Lu LY, Kuhn CD, Joshua-Tor L, Sharp PA.
Koch Institute for Integrative Cancer Research, Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA;
The MALAT1 (metastasis-associated lung adenocarcinoma transcript 1) locus is misregulated in many human cancers and produces an abundant long nuclear-retained noncoding RNA. Despite being transcribed by RNA polymerase II, the 3' end of MALAT1 is produced not by canonical cleavage/polyadenylation but instead by recognition and cleavage of a tRNA-like structure by RNase P. Mature MALAT1 thus lacks a poly(A) tail yet is expressed at a level higher than many protein-coding genes in vivo. Here we show that the 3' ends of MALAT1 and the MEN β long noncoding RNAs are protected from 3'-5' exonucleases by highly conserved triple helical structures. Surprisingly, when these structures are placed downstream from an ORF, the transcript is efficiently translated in vivo despite the lack of a poly(A) tail. The triple helix therefore also functions as a translational enhancer, and mutations in this region separate this translation activity from simple effects on RNA stability or transport. We further found that a transcript ending in a triple helix is efficiently repressed by microRNAs in vivo, arguing against a major role for the poly(A) tail in microRNA-mediated silencing. These results provide new insights into how transcripts that lack poly(A) tails are stabilized and regulated and suggest that RNA triple-helical structures likely have key regulatory functions in vivo.

EDIT: Just a follow up to this story, there is a related paper published in PNAS, here is the abstract from PubMed:

 2012 Nov 5. [Epub ahead of print]Formation of triple-helical structures by the 3'-end sequences of MALAT1 and MENβ noncoding RNAs.Brown JAValenstein MLYario TATycowski KTSteitz JA.Department of Molecular Biophysics and Biochemistry, Howard Hughes Medical Institute, Yale University School of Medicine, New Haven, CT 06536.Stability of the long noncoding-polyadenylated nuclear (PAN) RNA from Kaposi's sarcoma-associated herpesvirus is conferred by an expression and nuclear retention element (ENE). The ENE protects PAN RNA from a rapid deadenylation-dependent decay pathway via formation of a triple helix between the U-rich internal loop of the ENE and the 3'-poly(A) tail. Because viruses borrow molecular mechanisms from their hosts, we searched highly abundant human long-noncoding RNAs and identified putative ENE-like structures in metastasis-associated lung adenocarcinoma transcript 1 (MALAT1) and multiple endocrine neoplasia-β (MENβ) RNAs. Unlike the PAN ENE, the U-rich internal loops of both predicted cellular ENEs are interrupted by G and C nucleotides and reside upstream of genomically encoded A-rich tracts. We confirmed the ability of MALAT1 and MENβ sequences containing the predicted ENE and A-rich tract to increase the levels of an intronless β-globin reporter RNA. UV thermal denaturation profiles at different pH values support formation of a triple-helical structure composed of multiple U•A-U base triples and a single C•G-C base triple. Additional analyses of the MALAT1 ENE revealed that robust stabilization activity requires an intact triple helix, strong stems at the duplex-triplex junctions, a G-C base pair flanking the triplex to mediate potential A-minor interactions, and the 3'-terminal A of the A-rich tract to form a blunt-ended triplex lacking unpaired nucleotides at the duplex-triplex junction. These examples of triple-helical, ENE-like structures in cellular noncoding RNAs, are unique.

Saturday, 20 October 2012

DNA library preparation for next-generation sequencing

Library preparation is a process in which we modify DNA into a form that it is compatible for high throughput sequencing, and is becoming a key molecular biology technique. While there are an amazing variety of different library preparation methods available, I thought I'd start the the blog with a description of the classic method:
- Shearing/fragmentation
- End Repair
- DNA clean-up
- A tailing
- Adaptor ligation
- Size selection
- Amplification
- Quality control

The DNA needs to be in a size range that is compatible with the sequencing platform. The most commonly used sequencing platforms require DNA construct in the range of 300-500 bp, although this depends on the specific platform and the application. Fragmentation can be done by mechanical disruption through sonication like we do in our lab, but can also be done with a nebuliser or with enzymatic fragmentation. Our thoughts are that sonication/nebuliser has a lesser degree of sequence specificity bias as compared to fragmentase approach, thus giving a more even coverage distribution across the genome. Nebulisers are simple and quick to use, but limit you to just one sample at a time. Sonicators come in a range of configurations from using standard 1.5mL tubes to ones which can handle 96 well plates. Fragmentase may be a suitable option if you're working with a small genome or you don't have access to a sonicator. All of these methods require quite a bit of time to optimise. Sonicator power, time, presence of salts and liquid volume can all play a part in dictating the range of DNA fragments. For fragmentase, the concentration of both DNA and enzyme play a part, as well as the length and temperature of incubation. After fragmentase treatment, you will need to clean-up the sample which is normally done with a spin column, whereas this normally isn't required after sonication or nebulisation unless you want to concentrate the sample to a smaller volume.

After fragmentation, you will want to check whether it was successful. You can either run a microlitre on a microchip electrophoresis system or you can run an agarose gel. microchip systems use much less material and have better size resolution and are much preferred. At this stage, you will need to check that you have enough DNA for your library preparation. Most genomic DNA preparation kits suggest that you use 1 microgram of fragmented DNA, but in our experience, you can use much less than this (around 10 ng) with only a slight reduction in sequence coverage and diversity.

End Repair
During fragmentation, the DNA is broken, leaving a mixture of blunt ends, 3' overhangs and 5' overhangs. The end repair process removes the 3' overhangs with Klenow fragment and fills in the 5' overhangs with a T4 DNA polymerase. The end repair cocktail also contains T4 polunucleotide kinase (PNK) which phosphorylates the 5' ends and ensures the 3' ends carry a hydroxyl group. Ideally, choose a library prep kit which has all the three enzymes pre-mixed into a cocktail to save time. We incubate the tubes on a themo-mixer block set at 20C to reduce any effects of a fluctuating lab temperature.

After the reaction, you'll need to isolate the DNA and remove the enzymes and buffers. To do this, the standard protocol has been on a spin column such as Qiagen Qiaquick. On the other hand, newer protocols like the one recommended by Illumina, use magnetic Ampure or SPRI beads to isolate the DNA. Our lab has stuck to the spin column method because it is relatively quick (~10 minutes) compared to about 45 minutes for the bead procedure. Spin columns may work out to be more expensive at about $5 per prep compared to about $1 per bead prep, but once considering the time involved, then at low sample numbers performed manually the column prep is cheaper, whereas on automated work stations dealing with up to 96 samples simultaneously, the bead prep is more economical.

This procedure adds a single adenosine residue to the 3' ends of the blunt ended DNA. This helps to reduce the chance of these fragments ligating to each other and increases the rate of adapter ligation, as the adapters contain a single overhanging "T" base. This enzymatic step is performed by Klenow exo, and requires dATP. We incubate this in a PCR machine at 37C for 30 mins. After A-tailing, you'll need to do another DNA clean-up.

Adapter ligation
This is the stage where the DNA fragments are ligated to the sequencing adapters with T4 DNA ligase. It is important to use the correct amount of adapter for the amount of DNA present, as excess self-ligated adapter can cause headaches if it is carried through to later stages of the prep. Follow the recommendations of the kit manufacturer, and if you are working with smaller DNA amounts, use a smaller amount of adaptor (you may need a titration experiment to optimise), we have tried a 1/10 diluted amount with success on 10 ng DNA inputs. The ligation reaction is incubated on a thermo-mixer at 20C for 15 minutes, and the DNA is cleaned-up again.

It is essential that the DNA sample doesn't contain traces of residual ethanol as that can spell trouble for the following step. We let our columns dry for at least 10 minutes at room temperature to achieve this.

Size selection
This step is required to further eliminate the presence of self-ligated product and get a final library fragment size range which is compatible with the sequencing instrument. This method is done a range of ways in different labs depending on their throughput. In our lab, we stick to using 2% agarose gels to get a desired size range (200-300bp) and eliminate self-ligated product. Any residual ethanol is highly problematic here as you can literally watch your sample jump out of the well, ruining the prep. Obviously, this method is highly labour intensive and as such allows a technician to process only about 16 libraries per day. E-Gels are another option, which are pre-cast mini gels which are said to be quicker to prepare and run. If you are using an E-Gel in your lab I would love to hear some feedback on whether they are helpful or not. DNA is then excised from the gel/E-Gel and purified using a column clean-up.

For larger labs there are gel-free size selection methods suited automated library preparation which take advantage of the size specificity of SPRI beads in certain concentrations of PEG 8000 and NaCl. By altering these combinations, you can fine-tune the size selection for your application and best of all, this is amenable to automation. The size range might not be as tight and accurate as agarose gel excision.

Following size selection, it is common practise to use PCR to increase the overall amount of library and incorporate the sequencing primer annealing site (and barcode if required). There are amplification-free methods available (perhaps the subject of another post), but these are still uncommon. The number of cycles required depends on the amount of starting material. When beginning with microgram amounts, you may only need 4-8 cycles, but for nanogram amounts, you may need up to 12 cycles. Phusion polymerase is the most commonly used PCR enzyme, but there are others out there with apparently better coverage for GC/AT biased genomes (like KAPA). After PCR, you'll need to perform yet another DNA clean-up, this time on a dedicated "post PCR" area.

Quality control
To determine whether your libraries have actually worked, you'll need to run some QC checks and this can be done a variety of ways. The simplest way is to use Nanodrop UV Spec or Qubit Fluorometer to quantify the concentration of the sample. While relatively easy, that won't actually tell you whether you have significant adapter-only product present in the library. To find that out, you'll need to run the sample on an agarose gel or microchip electrophoresis. Again, the benefits of the microchip method are in sensitivity and size resolution. We use Shimadzu MultiNA and there are others like the Agilent Bioanalyzer, which is used by other labs including those at the Broad Institute. These microchip systems also come in handy for RNA and epigenetics analysis and so has become invaluable in our lab. Illumina recommends running samples on the bioanalyzer to verify the lack of self-ligated adapter product as well as running a qPCR to accurately determine the concentration of the library.

In our lab, we have found the best way to get even cluster densities on the flowcell is to use MultiNA to determine the volume of diluent to get 10 nM library, and then on the day of sequencing, to re-run the samples once again to quantify concentration and make small adjustments to the DNA volume to add into the sequencing reaction. The extra few minutes to re-run the samples is well worth the extra-consistent data volumes.

Things I haven't covered
Barcoding, robotics, microchip prep methods, amplification-free methods and applications other than genomic DNA will all be covered in future posts.

Selecting a method for your lab
Selecting a library prep method can be a hard one especially given all the different variations available. The most important consideration is the throughput that you will be expecting and secondly the budget. If you can't afford the capital outlay for a sonicator and microchip electrophoresis, your results could suffer as a consequence so these are really quite important. When choosing a sample prep kit, I'd recommend trying the one provided by the sequencing instrument manufacturer (Illumina/Roche/Life Tech), but also consider NEB, which we have found to be just as effective at a much better price. I will be reviewing new kits as they come onto the market in Australia.

If you have any feedback or thoughts on library prep kits and methods, I'd love to hear from you.

Further reading
Broad Institute Sample prep
NEB-Next Library Prep
Illumina genomic DNA sample preps
E-Gels from Life Tech
USC Epigenome Centre Library Prep

Tuesday, 16 October 2012

Hello world!

Welcome to Genome Spot.

Genome Spot is aimed at providing practical tips in applications of genomics and bioinformatics. From time to time, we'll also be discussing general news and recent articles.

About the author: Mark Ziemann is a researcher at Baker IDI Melbourne currently working in the field of epigenomics with a focus on human disease.