To test this, I went back to a paired end data set present in GEO (GSE55123, 2x 36bp), cleaned the data with Skewer, then mapped the reads with STAR in either paired-end mode or single-end mode (using just read 1).
I then used featureCounts to quantify number of tags aligned to each gene. I excluded genes with fewer than 10 reads per sample on average. Then I ran edgeR at Degust to identify differentially expressed genes (DEGs@FDR<0.05). I used a shell script to quantify the overlap in DEGs. Then I ranked them based on the p-value from most up-regulated to most down-regulated and compared their positions in the rank.
Here's the result of the overlap analysis. You can see that PE fastq detected more genes but identified fewer DGEs than SE.
Detected in PE:15919
Detected in SE:15275
Detected in both:14750
Detected in either:16444
Detected in PE only:1169
Detected in SE only:525
DGE in PE:4651
DGE in SE:5755
DGE in both:4356
DGE in either:6050
DGE in PE only:295
DGE in SE only:1399
DGE in PE and SE: 4356
PE DGEs not detected in SE: 138
PE DGEs detected but not significant in SE: 157
But I was curious to see whether the plot looked different if I included genes that were below the detection threshold in either SE or PE. The result is more consistent with the overlap analysis.
In conclusion, these results show that going for the cheaper single end option loses about 10% of detected genes. It was quite surprising to see that single end data gave 23% more DGEs. I'm not sure whether these SE specific DGEs are actually real - we can't really know. Nevertheless, 93.6% of DGEs identified in PE were also identified in SE, so there aren't many DGEs that are missed in SE data. As the data set used here is relatively short length (36bp), as compared to current HiSeq standard (50bp), the choice to go single end will likely lose you only 5-10% of genes. This proportion of genes lost would be lower still at 100bp length. Unless you're interested in performing splicing analysis, my recommendation would be for 100bp SE RNA-seq with 40M reads per sample. I'd be interested in your own experiences with SE and PE RNA-seq.