One of the most basic parallelisation tricks is to split a file into smaller fragments, process those smaller files in parallel and then concatenate/merge the final product. To split a file, you can use a tool such as sed to selectively output lines, another option is to use the split command which will divide the file by the specified number of lines.
Here's an example of splitting a fastq file (called sequence.fq) into 1 million line fragments:
split -l 1000000 sequence.fq sequence.fq.frag.The output files will look like this:
Be careful that fastq sequence files have an unusual structure of 4 lines per sequence instead of 1 line per sequence format such as sam/bam format and as such, you want to make sure the number of lines per file is a multiple of 4. Now you're free to run batches of smaller jobs over many processors.