Showing posts from January, 2016

Is paired end RNA-seq better than single-end for gene-wise gene expression analysis?

Something I've wondered about is whether for RNA-seq it's worth forking out the extra cost of sequencing both ends as opposed to single end.

To test this, I went back to a paired end data set present in GEO (GSE55123, 2x 36bp), cleaned the data with Skewer, then mapped the reads with STAR in either paired-end mode or single-end mode (using just read 1).

I then used featureCounts to quantify number of tags aligned to each gene. I excluded genes with fewer than 10 reads per sample on average. Then I ran edgeR at Degust to identify differentially expressed genes (DEGs@FDR<0.05). I used a shell script to quantify the overlap in DEGs. Then I ranked them based on the p-value from most up-regulated to most down-regulated and compared their positions in the rank.

Here's the result of the overlap analysis. You can see that PE fastq detected more genes but identified fewer DGEs than SE.

Detected in PE:15919
Detected in SE:15275
Detected in both:14750
Detected in either:16444
Detected …

2015 Wrap-Up

Its that time of year again where we can reflect on the year that was, hit the reset button and focus on the trend that will dominate 2016.
Sequencing hardware This time last year, we welcomed the NextSeq500, NeoPrep and updates to HiSeq2500. Throughout the year there were further announcements from Illumina on the HiSeq4000 and the technology appears to be improving incrementally from here on. Indeed while cluster numbers have improved, there have only been modest improvements in read length and pricing in Australian dollars has not decreased as substantially as we may have predicted. I'm really excited about the developments in 3rd gen technology coming from Oxford Nanopore and Pacific Biosciences in that the read lengths and accuracy are starting to improve. The growing user base is definitely spurring improved basecalling and error correction algorithms that will further increase the user base. Metagenomics has really been the main beneficiary of 3rd gen long read seq and that …