RNA-seq series of posts and subsampled it using samtools and shuf into file sizes of 1M, 2M, 5M, 10M reads, as well as the bam file containing 25.4M reads.
I then used the benchmarking script described in the previous post to record execution time, CPU usage and peak memory for read counting to generate a gene-wise matrix. I used featureCounts in the single thread mode as well as the parallel mode (maximum of 8 cores).
|Execution time for read counting by featureCounts, BedTools Multicov and HTSeq-count for bam files of varying sizes.|
Then I broke down the overall times into that invested in (1) reading in the annotation file and (2) processing the reads.
This result shows that featureCounts is considerably faster at both tasks than its competitors.
Exact commands used:
htseq-count -q -f bam --stranded=no --minaqual=10 --type=gene --mode=union test.bam Homo_sapiens.GRCh38.76.gtf > test.bam.htseq.count
bedtools multicov -D -q 10 -bams test.bam -bed Homo_sapiens.GRCh38_exon.bed > test.bam.bedtools.cnt
featureCounts -Q 10 -M -s 0 -T 1 -a Homo_sapiens.GRCh38.76.gtf -o test.bam.featureCount.cnt test.bam
featureCounts -Q 10 -M -s 0 -T 8 -a Homo_sapiens.GRCh38.76.gtf -o test.bam.featureCount.cnt test.bam