Here’s an example figure, but note that the “alignment” is transpose of how we usually think.
Phylogenetic Reconstruction. We reconstructed phylogenies using two in-
dependent approaches. First, we calculated a distance matrix for each patient
using an “equal or not” distance (31). This method increases the distances
between two samples if they have unequal genotypes, regardless of the
magnitude of the difference. We then used neighbor-joining (51) in R to
infer the phylogenetic relationships between samples. In the very rare case
of missing values, we imputed them using the nearest neighbor. We used
bootstrapping with 1,000 replicates to test the reliability of the resulting
trees (52) and collapsed all interior branches with bootstrap values below
70% into polytomies. Next, we used Bayesian inference of phylogeny—
a methodology that relies on a fundamentally different set of principles
than neighbor-joining—to construct the phylogenies. The results were al-
most identical in all cases, confirming the robustness of our approach.
Bayesian phylogenies and posterior probability values for all clades are
presented in SI Appendix, Fig. S10. We used the software MrBayes (53) with
the same model parameters that were previously used for the analysis of
poly-G tract mutation profiles (21).