Pages

Monday, 1 September 2014

Tabular sequence to fasta format

Here is a 1 line command to turn a list of sequences into a fasta file.

$ cat seq.txt 
CAACACCAGTCGATGGGCTGT
CAACACCAGTCGATGGGCTGTC
CAACACCAGTCGATGGGCCGT
TAGCTTATCAGACTGATGTTGA
TAGCTTATCAGACTGATGTTGAC

We use nl to count the lines, then sed to remove whitespaces and introduce the arrow ">" then tr to create line breaks.

$ nl seq.txt | sed 's/^[ \t]*/>/' | tr '\t' '\n'
>1
CAACACCAGTCGATGGGCTGT
>2
CAACACCAGTCGATGGGCTGTC
>3
CAACACCAGTCGATGGGCCGT
>4
TAGCTTATCAGACTGATGTTGA
>5
TAGCTTATCAGACTGATGTTGAC