Quality filtering can be done a few ways, by filtering out entire reads which have poor base quality scores, by converting poor quality base-calls to "N" or hard trimming reads to a certain length before the Q scores start to rapidly decline. I'd much rather use a quality based trimming tool which starts at the 3' end of the read and removes bases below a certain Q threshold. This can be done using fastq_quality_trimmer on the command line or in galaxy. You set the threshold you want to use, in this case Q30, as well as the minimum length of sequence to keep, which we have set at 37 nt.
fastq_quality_trimmer -t 30 -l 37 -i dataset.txt -o dataset_Q30.txt
Now to remove the adapter sequence, there are a few software options (Trimmomatic, cutadapt, among others) but we again chose the fastx toolkit for this example. Keep in mind that the below adapter is for the Illumina Truseq genomic DNA kit and is different for other sequencing platforms. The "-l 37" parameter is the length of the shortest read to keep, so any read shorter than this is discarded, you can tailor this to your specific need.
fastx_clipper -a GATCGGAAGAGCACACGTCTGAACTCCAGTCACATC -l 37 -i dataset_Q30.txt -o dataset_Q30_clip.txt
One thing I need to mention is that the above will work really well for fastq Illumina, but might not work for fastq sanger, which has different quality score characters. This incompatibility has affected a lot of people in the forums and can be solved by adding -Q33 as an option.
Now that the sequence data is now trimmed for bad bases and adapter contamination, we can start the analysis!