This is an archived static version of the original phylobabble.org discussion site.

The SNAD Sequence Renaming tool

BrianFoley

The Sequence Name Annotation Designer tool is very useful for renaming sequences obtained from BLAST output, for example. Note that characters such as “:” and “(” are not allowed in sequence names in a Nexus/Newick tree file.

>gi|260677811|gb|GU046734.1|:1-1695 Influenza A virus (A/mallard/Bavaria/35/2006(H5)) segment 4 hemagglutinin (HA) gene, complete cds ATGGAGAAAATAGTGCTTCTTCTTGCAATAGTCAGTCTTGTTAAAAGTGACCAGATTTGCATTGGTTACCATGCAAACAA

>gi|301322577|gb|HM849027.1|:1-1695 Influenza A virus (A/mallard/PT/28006/2007(H5N3)) segment 4 hemagglutinin (HA) gene, complete cds ATGGAGAAAATAGTGCTTCTTCTTGCAATAGTCAGTCTTGTTAAAAGTGACCAGATTTGCATTGGTTACCATGCAAACAA

>gi|148532753|gb|EF597262.1|:1-1684 Influenza A virus (A/mallard/Italy/1980/1993(H5N2)) hemagglutinin (HA) gene, partial cds ATGGAGAAAATAGTGCTTCTTTTTGCAATAGTCAGTCTTGTCAAAAGTGACCAGATTTGCATTGGTTACCATGCAAACAA

etc…

to

> A/H5N1/mallard/Bavaria/35/2006_GU046734 ATGGAGAAAATAGTGCTTCTTCTTGCAATAGTCAGTCTTGTTAAAAGTGACCAGATTTGCATTGGTTACCATGCAAACAA

> A/H5N3/mallard/Portugal/28006/2007_HM849027 ATGGAGAAAATAGTGCTTCTTCTTGCAATAGTCAGTCTTGTTAAAAGTGACCAGATTTGCATTGGTTACCATGCAAACAA

> A/H5N2/mallard/Italy/1980/1993_EF597262 ATGGAGAAAATAGTGCTTCTTTTTGCAATAGTCAGTCTTGTCAAAAGTGACCAGATTTGCATTGGTTACCATGCAAACAA

etc…

jeetsukumaran

Again, apologies for the auto-tooting … DendroPy’s interop.genbank module provides this functionality for GenBank data, at any rate, with several options to customize the labeling for compatibility with most phylogenetic software.