Pages

Tuesday, 16 December 2014

User friendly RNA-seq differential expression analysis with Degust

There is a need to make bioinformatics tools more user friendly and accessible to a wider audience. We have seen that Galaxy, GEO2RGenevestigator and GenePattern have each developed a huge following in the molecular biology community, and this trend will continue with introduction of new RNA-seq analysis tools. Previously, I posted about differential gene expression analysis of RNA-seq performed by the DEB online tool. In this post, I introduce Degust, an online app to analyse gene expression count data and determine which genes are differentially expressed. Degust was written by David R. Powell (@d_r_powell) and was Supported by Victorian Bioinformatics Consortium, Monash University and VLSCI's Life Sciences Computation Centre.

In this test, I'll be using the azacitidine mRNA-seq data set that I have previously analysed. To make the count matrix, I used featureCounts.

First step in the process is to your RNA-seq count data. It can be done in tab or comma separated formats. Once uploaded, you're given a configuration screen to specify the format of the data and the sample groups. Make sure you specify which column contains the gene names/accession numbers. Hit the "view" button and you'll get the smear plot. You can use the mouse to highlight genes. At the bottom there is a table of most statistically significant genes and the search function allows you to quickly find your favourite genes.
Degust interface with smear plot.
You also get an MDS plot to visualise the similarity between samples. What I really like is the graph in the lower right corner showing the relative magnitude of the first 6 MDS dimensions.
MDS plot.
And a parallel coordinates plot (which is probably more interesting with 3 or more sample groups).
Parallel coordinates plot.
So now I'm going to compare the results of Degust with those from DEB using  both DESeq and edgeR as well as the recommended edgeR script obtained from the user manual. I used an awk command to filter the spreadsheets to identify differentially expressed genes (DEGs). I then used the Venny online tool to calculate the overlaps between the DEG lists.
Venn diagram of differentially expressed genes. * Denotes the use of DEB online tool.
You can see that edgeR from the DEB server identified the most DEGs, followed closely by my edgeR script (included below), then Degust and lastly DESeq. My edgeR script differed from the DEB edgeR version likely because I'm using edgeR_3.4.2 & limma_3.18.13 while DEB likely uses an older version. Degust was more similar to the edgeR analyses compared to DESeq, which appears conservative in comparison. Looking at the Degust R code, you can see that it uses Voom to normalise count data to make it compatible for use with Limma statistical analysis as described in a recent Genome Biology paper.

In summary, Degust is a valid tool for RNA-seq analysis for simple comparisons (ie unpaired, 2-sample groups) that is faster and more user friendly than DEB. Its attractive, intuitive and responsive interface suggests that it will be a popular tool for expression analysis.

It would be great if it could also deal with more complicated experimental setups such as sample pairing, ANOVAs and GLMs; and I see downstream pathway analysis such as GSEA as a natural extension to Degust.

R code

library("edgeR")
x<-read.delim("aza_mRNA.mx", row.names="Symbol")
group<-factor(c(1,1,1,2,2,2))
y<-DGEList(counts=x,group=group)
y<-calcNormFactors(y)
y<-estimateCommonDisp(y)
y<-estimateTagwiseDisp(y)
et<-exactTest(y)
write.table(topTags(et,n=20000), file="aza_mRNA_edger.xls", quote=FALSE, sep= "\t")