Wednesday, 28 May 2014

Reverse complement a sequence

Starting with a tab separated file

Use the reverse utility and sed to generate the reverse complement of a tabulated sequence file.

cut -f2 file | rev | sed 'y/CGAT/GCTA/'

You could also make the reverse complemented sequence in lower case using sed 'y/CGAT/gcta/'
If you need the sequence name, paste it back.

cut -f2 file | rev | sed 'y/CGAT/GCTA/' | paste file -

Starting with a fasta

Now starting with a fasta file. Avoid tabs and special characters in the sequence name.

paste - - < file > tmp
cut -f2 tmp | rev | sed 'y/CGAT/GCTA/' \
| paste tmp - | cut -f1,3 | tr '\t' '\n'