In this series of posts, we're going to go step-by-step into analysing RNA-seq data. I found a nice data set on GEO containing RNA-seq and bisulfite sequencing data from AML3 cells treated with the drug Azacitidine (GSE55125). This drug is known to block DNA methylation, so it will be interesting to see how this effects gene expression and whether we can learn anything extra about the mechanisms of this potential anticancer drug.
Many thanks to the data contributors at the Beatson Institute for Cancer Research, University of Glasgow.
Step 1: Download from GEO and convert to fastq
Step 2: Quality control of RNA-seq data
Step 3: Align paired end RNA-seq with Tophat
Step 4: Count aligned reads and create count matrix
Step 5: Differential analysis of RNA-seq
Step 6: Draw a heatmap of gene expression
Step 7: MDS plot
Step 8: Pathway analysis with GSEA
Step 9: Integration of ENCODE transcription factor binding data