Archive 04/01/2022.

How to measure the phylogenetic signal, vs random noise in a data set?


A user in another phylogenetics discussion group today had a question about analyzing more than 100 sequences each of more than 80,000 bases length, all from one gene. This lead me to assume the sequences were from closely related organisms because otherwise the introns could be too diverse to align while the exons were still alignable. This made me wonder, if we have 100 very long sequences from a single species of mammal (for example humans sampled around the world) what types of tests can be done to look for recombination, and how to measure the phylogentic signal to noise ratio in the data. The consistency index and retention index are two useful measurements, but I rarely see them reported for data sets, and most phylogenetic software packages to not compute them and display them with the results.