This is an archived static version of the original phylobabble.org discussion site.

`ape` gods: how should I go about doing phylogenetic diversity calculations on a tree?

ematsen

Gods of ape and other tree packages in R (I’m looking at you, @klaus_schliep, @tanja_stadler, @phylorich, etc, etc) –

@mathmomike and I are noodling about with some new phylogenetic diversity ideas and I’d like to code them up. I’m hoping that there are some really lovely ways for me to code up a new PD by just specifying some functions to use for a recursion. Can you give me some pointers?

[I suppose I should say more widely that I wouldn’t be opposed to coding these things up in Python, OCaml, etc, but it seems to me that there are lots of wonderful R packages for this sort of thing and I could try to merge my code in there if this goes anywhere.]

Thank you in advance. Below is my mental image of you, delivering me lots of trees.

phylorich

It’s very early days, but I’m working on a new package “forest” to make that easy, based on things I learnt writing diversitree (it’ll be the guts of diversitree2).

Basically, I’ve wrapped up the treetree library to make it suitable for use with phylogenies. It’s all C++, and has nice things like iterator pairs for post and pre order traversal that make implementing Felsenstein’s pruning algorithm very easy.

For example, here’s all the code needed to compute likelihoods under BM, which gets converted into something useable by simply declaring it a type.

The package is nominally for R but it’s basically pure C++ that could be wrapped up with something else too.

However, everything in the package can and will change and it’s still totally experimental. And it’s being done in my spare time so progress is slow. I’m currently sidetracked yak shaving a nicer ODE solver.

ematsen

Oh, dear. The whole idea here was to avoid yak-shaving, not join in with someone shaving a much bigger yak.

It is a crying shame that there is no yak-shaving emoticon. I think shaved ice + water buffalo is as close as I can get: :shaved_ice: :water_buffalo:.

koadman

I’ve hacked around inside phylogenetic diversity analyzer (PDA) before and found their code to be reasonably understandable. http://www.cibiv.at/software/pda/ Hard to say whether it’s setup well for your purpose though, and it’s C++ so not on your list of preferred tongues. They also seem to have pulled down the source code from their site, or at least I can’t find it anymore, but I have a GPL copy of v0.5.2 and am happy to share if there’s interest.

rutgeraldo

There’s a number of tree shape indices in Bio::Phylo already (doi:10.1186/1471-2105-12-63, doi:10.1186/1748-7188-7-6) so if you’re not averse to coding in Perl that might be an option. It has a lot of functionality for recursive traversal using the visitor methods, e.g. with $tree->visit_depth_first you can pass in code blocks that are executed on the nodes in pre- and/or post-order traversal.

GrahamJones

I have never done the sort of thing you’re trying to, but I probably will want to one day. I just looked at the source for balance(). (Just load the ape package and type ‘balance’.) It looks to me like the internal function foo() is recursively visiting nodes. Perhaps you can imitate this.

ematsen

Thank you very much, everyone, for your help.

@klaus_schliep helped me out via email. Here’s what he showed me: http://rpubs.com/ematsen/ape-traversal-sample

I think that’s quite elegant. Thanks, Klaus!

david_bryant

Hey Erick,

Usually there is code to get the additive metric from the tree. Then to get the phylogenetic diversity of a subset a_1,…,a_k you can use

1/2 [d(a_1,a_2) + d(a_2,a_3) + \cdots + d_(a_{k-1},a_k) + d(a_k,a_1)]

where the a_i are ordered as in the NEWICK ordering. Easier than faffing around with recursive algorithms.

-D.

ematsen

Thanks, David. I should say that I’m not just interested in only doing functions such as PD that have the necessary linearity, but also more general functions.

ematsen

Ah, and I should also say that Klaus pointed me to this PDF describing the way trees are stored in APE after 2006 (and unlike how they are described in the first edition of Paradis’ book).