Archive 04/01/2022.

# A sane parsimony implementation

ematsen

Hello folks–

Can anyone suggest a sane-to-use and reasonably fast implementation of parsimony tree estimation for DNA sequences that returns ancestral sequences as well? We have been using PHYLIP’s dnapars but getting all the information out of its output files is pretty hairy.

@NickJMatzke has (partially?) written http://phylo.wikidot.com/tntr as an R front-end to the TNT software, but this also looks pretty wild!

Yes, parsimony .

Erick

Alexis_RAxML

There is one parsimony implementation in LLPLL and also my open-source implementation in parsimonator which is pretty fast: https://github.com/stamatak/Parsimonator-1.0.2 it does however not output ancestral states.

Sounds like you want to implement parsimony-biased moves for Bayesian MCMC

Alexis

ematsen

Thanks, @Alexis_RAxML!

You’re actually thinking lower-level than we’d like. I’d like a binary that takes in sequences and returns as many maximally parsimonious trees as it can find, along with the ancestral states.

We could hook up one of your implementations to a front-end and tree rearrangements, but this is more work than we wish to take on right now.

dnapars does what we want, but its interface, formats, and failure modes have been frustrating for some folks in the lab.

Alexis_RAxML

Ah, you mean output as many best-known equally parsimonious trees as possible of course, since finding the maximum parsimony tree is NP-hard (I know that you know this, I just couldn’t resist) This would require changing the parsimonator code quite a bit, I am afraid.

klaus_schliep

Hi Eric, the good old Paup* and TNT may be interesting to look in. There is also extensions like the parsimony ratchet for Paup*.
XMP (White and Holland) offers a fast branch and bound search which would return all best MP trees, but only for limited number of taxa. The paper also gives a lot of nice inside into many computational aspects you may be interested. If you are interested in finding many trees of minimal parsimony score you should also look into a different direction. A problem with the methods I mentioned above is that they usually will return binary trees.
So you want to assign edge length on your trees (MPR, ACCTRAN, DELTRAN, …) and prune edges which have length 0. This gives can give additional trees with a more parsimonious parametrization (less edges) than their binary counterpart. Who would report a non-parsimonious maximum parsimony tree? The other advantage is that you can generate candidate binary trees. If your pruned tree has for example has two edges of degree 4 and two edges of degree 5 than you can expand this to 3 * 3 * 15 * 15 = 2025 binary trees, which may also share the same minimal parsimony score. You are like to miss some of these trees if you use heuristic search. It is however unlikely that implementations of the fitch algorithm will be able to handle multifurcating trees, the sankoff algorithm should have no problem with this. Cheers, Klaus

ematsen

Thanks, Klaus!

You clearly know these things a lot better than we do! I had to look at some of Felsenstein’s slides to know about Fitch vs. Sankoff.

We have tried playing around with PAUP*, but this is pretty much a deal-breaker:

It seems ironic that we’re struggling to get something so basic!