This is an archived static version of the original phylobabble.org discussion site.

Feedback on new web-based very large tree viewer

rdmpage

I’m working on a web-based phylogeny viewer that uses an approach similar to Google Maps to display large images. There is a live demo here.

I’ll blog about the project and put the code into github shortly. but I’m looking for any immediate thoughts people might have. My goal here is to have a simple, easy to use viewer that enables you to quickly browser through a large tree. For example, I envisage adding this to my map of DNA barcodes.

Let me know what you think. Would you find this useful?

ematsen

This is totally cool!

One thing that I’ve always wanted is a phylogenetics equivalent of http://getcloudapp.com/ or similar service. The idea is that you can upload a file and get an obfuscated URL which points to a reasonably nice visualization of the file (e.g. http://cl.ly/code/3y2q3O2O3O13/). A free service where one could upload a tree and get back something that could be shared with collaborators would be a huge win.

Think that might be possible with your viewer?

We have a nice little alignment-to-html converter that would be a nice complement.

rdmpage

@ematsen Glad that you like it. Yes, I plan to add an upload feature (ran out of time last night). Generating the tiles can take a while for a big tree, so I’d like an interface that shows progress of that process.

Regarding obfuscated URL, I’d use md5 hash of tree, so it will be obfuscated anyway, but do you mean that the URL is private, in the sense that only you and those you’d share it with would have access to it?

Another feature that might be useful is to include a location in the link, so you can say to someone “look at this part of the tree” (equivalent to sending someone a link to a place in Google Maps).

rdmpage

OK, you can now upload a NEXUS format tree (or give a URL for one). All very crude but at least you can experiment: Deep tree viewer

Code is on github GitHub - rdmpage/deep-tree: Big web-based phylogeny viewer

pscUtah

This is superb. Looks like it will be wonderful for a graduate student that I’m advising. We’ll fool around with it and give you some feedback.

rdmpage

Thanks. It’s very much a work in progress, and very fussy about the tree format. For example, people stick all sorts of symbols in Newick files that my simple parser just baffs over. I hope to fix this ASAP. If you have a tree that fails, just send me an email rdmpage@gmail.com or message me here and I’ll take a look.

ematsen

So cool. Re parsing, could you use some external library to normalize the string representation? I’m thinking one of the phylo libraries in Bioperl/Biopython, or DendroPy, or the fantastic command line Newick utilities.

rdmpage

I’m a roll your own kind of guy :wink: My current tree parser is a quick and dirty hack to avoid writing a decent one, just need a quiet afternoon to fix it. I also want to add a few things to the NEXUS parser like support “MRCA” commands to label internal nodes.

josephwb

Cool. Trying it with the OpenTree tree now (87 MB). We got all sorts of crazy characters, but taxon names are newick-valid (I know this is Nexus, but hopefully the naming conventions are compatible).

rdmpage

@josephwb How did you get on? I don’t see a 87 Mb file on the server (I suspect that’s going to be too big for the web server to upload). Where does the OpenTree live? Is there a URL I can grab it from?

rdmpage

@josephwb Please tell me it’s not the tree with labels like “6461_K∆RHEDE” …

josephwb

@rdmpage Nothing happened. I guess the web server couldn’t handle it. I can put it somewhere and send you the link.

There doesn’t appear to be any “6461_K*” in the tree, so hopefully that is not mine.

josephwb

@rdmpage Tree is up here in both uncompressed (82 MB) and compressed (tgz; 21 MB) versions.

rdmpage

@josephwb Thanks, I’ll take a look. The dimensions may be a bit challenging for the tile maker script…

rdmpage

Unit branch lengths :frowning:

rdmpage

@josephwb OK I’ve loaded just the mammals to see what would happen. Not pretty as you can see Deep tree viewer

Compare this to the mammal super tree from http://dx.doi.org/10.1038/nature05634, which you can see here Deep tree viewer and which looks much nicer:

I’ll need to investigate why this is (will try it with tree with same number of leaves). Obviously the trees are different (unit branch length versus chronogram), but the OpenTree looks very “chained”, which may reduce the number of nicely distinct subtrees. May also be simply due to interaction between number of leaves and the zoom function (i.e., at every level I double the size of the image). Perhaps the structure of the tree is poorly served by my method for collapsing nodes. But part of me thinks the OpenTree looks a little “pathological”…

ematsen

Sure, but a NEXUS parser?

Ah, the enthusiasm of youth.

josephwb

Regarding the “pathological” appearance: I should have mentioned two things. First, there are a whole lot of polytomies; this should not be unexpected: given only a tiny fraction of taxa have ever appeared in trees, most of the OpenTree is taxonomy. Second, there are also something that we have been calling “knuckles” or “knees”: when monotypic taxa are involved, the entire taxonomic structure is present in the tree. An example is the Hoatzin (you’ll only get bird examples from me), the lone species in an entire Order. The tree preserves the 1) Species, 2) Genus, 3) Family, and 4) Order. Here it is in the tree string:

(((((Opisthocomus_hoazin_ott928360:1.0)Opisthocomus_ott70726:1.0)Opisthocomidae_ott928357:1.0)Opisthocomiformes_ott928359:1.0…

I know that some software (e.g. APE in R) cannot handle knuckles. Is this something you can handle?

rdmpage

Ah the joys of Gregg’s paradox http://iphylo.blogspot.co.uk/2009/10/wikipedia-and-gregg-paradox.html. I’ll investigate what the code does in this situation.

rdmpage

@josephwb I may have to take back the “pathological” slur. There’s a fairly crippling feature of the algorithm that I use to choose which nodes to collapse that means it will make a mess of highly unbalanced trees. I do a preorder traversal that asks whether the immediate children of a node are more than a specified distance apart (a function of the font size). If they aren’t, I draw the subtree rooted at that node as a polygon. If the base of the tree looks like ((a,b,c),(d,e,f)) this works great. But if it’s (a,(b,(c,(d,(e,f))))) not so much.

josephwb

Regarding Gregg’s Paradox (and thanks for pointing that out: I didn’t know it had a name), the reason we keep all the levels in is that while a taxon may have only one extant species (or whatever), it is possible that there are many taxa that are simply extinct. Given the messiness of fossil taxonomies, we are currently filtering fossil taxa out. However, the goal is to ultimately have everything in there, so (most) problems that involve preserving the taxonomic hierarchy should go away.

taxonbytes

Here is how the monotypy issue is represented in Euler/X. Different ranks – same referential extension. Ranks are serviceable for some things (human learning, for instance, though also human erring in “learning”), less so in other contexts. In some contexts asserting that all elements in a monotypic lineage are “the same” is probably acceptable.

http://taxonbytes.org/alignment-of-two-classifications-of-the-monotremata/