I recently had need to reconstruct a reasonably large Ebola virus tree and I wanted a maximum likelihood tree with good branch length estimates and bootstrap values for all the nodes in the ML tree. We wrote this pipeline that bats back and forth between phyml (to get an initial NJ tree and optimise the branch lengths) and raxml (to find ML topologies and bootstrap).
I thought I would post it here for discussion and to see if anyone would do things differently. On a fixed topology phyml
did a better job of optimising branch lengths but this may be because I couldn’t find the appropriate settings in raxml
.
#phyml to find initial tree using BioNJ phyml --quiet -i $1.phy -q -t e -a e -o lr -b 0 mv $1.phy_phyml_tree.txt initial_tree.newick #raxml to find ml topology raxml -f d -T 6 -j -s $1.fasta -n topology -m GTRGAMMA -t initial_tree.newick #phyml to optimise branch lengths on ml topology phyml --quiet -i $1.phy -q -t e -a e -o lr -u RAxML_result.topology -b 0 mv $1.phy_phyml_tree.txt $1.ml.tree #use raxml to produce boostrap trees - the seed needs to be set here in $2 raxml -f a p $2 -s $1.fasta x 12345 -#200 -m GTRGAMMA -n bootstrap #use raxml to project boostrap values on to ML tree raxml -f b -t $1.ml.tree -z RAxML_bootstrap.bootstrap -m GTRGAMMA -n final mv RAxML_result.final $1.ml_bootstrap.tree