Hi Elijah,
What you have to keep in mind is that however fancy the diversification-through-time model is, at it’s core, you’re still estimating speciation and extinction rates. To estimate a rate, you need a number of things that happened (speciation events), and the amount of time in which they happened (the duration of the clade). This applies whether we are using RevBayes or BEAST to estimate a time tree and diversification rates (from constant-rate models or fancier models) at the same time, or taking clade ages and species counts to get simple constant-rate estimates.
Incomplete sampling is important, because although my phylogeny may have n species in it, the size, m of the clade may be much larger then n, so any rate based off n will be wrong. Imagine we wanted to ask which has a greater speciation rate, birds or mammals. From mammals, which are ~435 million years old, we have sampled 2000 species. From birds, which are ~111 million years old, our grant money ran out and we only got 10 species. Following Magallon and Sanderson (2001), we estimate the net diversification rate as r = ln(n / 2) / t. So we estimate that mammals have a diversification rate of r = 0.0158799, while birds have a net diversification rate of r = 0.01449944, and we conclude that mammals speciate faster. But, there are really ~5400 mammals and ~10000 birds, so the rates are really something like r = 0.01816323 for mammals and r = 0.07673147 for birds. By missing samples, we not only estimated wildly incorrect rates, we flipped the direction of the result. Now, obviously it’s a bit hard to get the wrong result when one group is diversifying over 4x as fast as the other, but that’s a rather large difference, we can’t expect any groups we go study to display such a clear difference. That’s why m matters.
Now, how to get m? As you say, taxonomy gives us some answer. But you also mention using a diversification rate. But now we’re caught in a circle, because we use need r to get m, but we can’t get r without m. Anything we do, any assumption we make about how this group is diversifying, will completely color our results, as our results will be the assumption we made.
The other thing to keep in mind is that comparing diversification rates across land masses is probably either invalid, or requires a more complicated model. Unless the landmasses all have reciprocally monophyletic clades in them, you cannot split apart the tree by island, because the groups on those islands share a tree, and thus they are non-independent. It would be possible to do a joint reconstruction of island and diversification rate, as in a MuSSE model. Now, for a BiSSE model, you need something like a few hundred species to get a good estimate of the effect of the binary trait on speciation and extinction rates, one imagines that even more species are required for MuSSE. Even if they are reciprocally monophyletic, confidence intervals on speciation parameters are not small, and you may have a hard time saying anything about diversification rates.
The bottom line is this:
-
If the islands don’t have monophyletic clades:
a) If you don’t have samples for >100 species in the genus, there’s absolutely no way.
b) If you do, and you only care about the differences between the islands, you may not have to worry too much about the sampling fraction (unless sampling is biased).
-
If the islands have monophyletic clades:
a) If you have some idea about the range of species missing, you could run multiple analyses of each clade through the range of possible. Then you have some idea of whether one group is diversifying faster, but between the uncertainty inherent in estimating diversification rates, and the uncertainty in what the true number of species is, unless there is a rather large effect, you probably couldn’t say anything remotely conclusive.
b) If you really don’t have any idea, there’s absolutely no way to say anything.