I think one of the first things that got me interested in evolutionary biology was finding out that whales used to have legs. It's pretty incredible that the sleek, powerful whales of modern day oceans had their origins in blundering land animals like cows and hippos. Nothing spoke more to me about the aeons of time that has passed than imagining generation after generation of whales tentatively playing with the water, feeding there, learning to become good swimmers, and eventually swimming in the open ocean amongst the fishes, their terrestrial history a long forgotten memory.

It makes me feel that hippos should really be ashamed of themselves! In 50 or so million years, hippos have lazily stayed wallowing in pools, whilst their overachieving cousins the whales have completely redesigned their body plan for water. In fact, the land to sea transition only took them 10 to 12 million years to achieve. Here is the granddaddy of whales, Himalayacetus, which seemed to be largely terrestrial with occasional day trips into the ocean,

There is actually a point to this post, rather than me just waxing lyrical about whale evolution! If you remember a while back, Hank wrote a post which dismissed a nature paper about how we are now in the "6th major mass extinction". Its taken me a while, but I thought it would be a good idea to illustrate the problems in conducting massive meta-studies such as these.

In whale evolution, we have a fantastic example as to why we really have to be careful about basing diversity estimates and extinctions on simply numbers of genera and first appearences. This recent study on whale evolution offers a perfect example, as the authors really tried to mitigate for every source of bias.

Take a look at this time scale of cetacean (whale) evolution, from their origin in the early Paleogene to recent,

From Uhen (2010)

Now, you might spot a couple of trends here. Unfortunately, you just can't trust this apparently neat tracker of diversity, because there are so many factors as well as the actual diversity that conflate the number of different whales trough time.

Time period bias

The thing that we need to remember throughout is that the fossil record is very much a human construct. It would have been much better if from the start we had done the whole thing systematically; gone through the fossiliferous outcrops of the world and catalogged everything neatly. But we didn't. Geology began with victorian gentlemen pottering round the lake district and picking up pretty things to show their pals. Today, it is still very much a pursuit of collecting pretty things we are interested in from particular areas we like and then publishing this. If somebody uses your paper after googling it to do a meta-study on the evolution of said pretty thing, well, good for them.

John Woodward, your quintessential (in this case, pre-) victorian paleontologist

So, the fossil record is biased towards the finds of prolific collectors who favour particular geographic regions and therefore, particular time periods. Certain time periods are thus oversampled with respect to certain groups. For example, a certain Mr. P.J. Harmatuk provided 6585 fossil whale specimens at the Smithsonian. This is 75% of the fossil whale specimen database. 99% are all from one locality: the Texasgulf Sulfur Mine, near his home in in Aurora, North Carolina. All are dated 4.4-5.0 Ma.

We also have a problem in the geology itself. In North America, there is simply more rock outcropping of middle Eocene age than of late Eocene age. Nothing sinister, just a quirk of geology. But this means that there is much more rock to find late Eocene fossils in, and so our estimates of diversity are inflated for middle Eocene fossils. We can try and counter for this by dividing by map area, but this isn't great; most of America is covered with soil and so area doesn't reflect the amount of rock very well.

Look back at the chart. The early Oligocene has a very low outcrop area in America, and this also happens to be a period of low diversity of whales.

You might be surprised to learn, though, that another such period of low outcrop area is the early Pliocene. Scroll up, and you'll see that actually, this is one of the periods of greatest diversity, seemingly in spite of the lack of early Pliocene rock. Have we perhaps even underestimated the diversity here, and the oceans were teeming with whales?

If I told you that 25% of the whale genera listed in the Pliocene were extant, whereas only 8% in the period just preceding were, you might be forced to change your mind. What's going on here is very human. Basically, pre-Pliocene, taxonomists just don't have the balls to ascribe their fossil to a living genus. They prefer instead to just leave it without a name or ascribe it to something else extinct. Sad but true! After a nice convenient epoch change, they are less worried about correlating the fossil with a modern genus. Hence a lot of that apparent diversity may be just more extinct whales.

Despite these effects, in this chart here, we actually have a very simple reason for an apparent drop in diversity in spite of the other time biases at hand. The middle Eocene (Lutetian plus Bartonian, particularly the Lutetian) is decidedly quite long compared with the late Eocene (Priabonian). So, a series of whale faunas that lived at separate times get lumped together. This has the effect of artificially inflating the apparent generic diversity during the Lutetian. Simple!

Taxonoic bias

Estimates of species diversity on many groups, including whales, can be massively inflated by profusive naming by overzealous taxonomists.

Some animal taxa are based entirely on bits of animals such as teeth, and therefore we may have given several names to specimens that are actually from the same taxon. Moreover, when people name new species, sometimes people just don't compare with the type specimen well enough - this problem can be even worse with specimens of big things like whales because of the sheer difficulty in physically comparing them side by side.

Sometimes it is just too much effort to actually publish things. I've been "backstage" at the Natural History Museum in London, and there are plastercast specimens dated back for years; they were collected in the field and still, no one has got round to unwrapping them due to their large nature. Alongside the unseen diversity of fossils still deep in the Earth's crust, there is therefore an unseen world of fossils hidden backstage in the Earth's museums.

I don't really want to get started on biases in diversity due to human taxonomic reasons, because there are so many that I would end up boring you. But it's worth remembering that our estimates of diversity are massively inflated by things that are interesting and preserve well. For whales, if you plot the number of papers on fossil whales by geologic age, the number of diagnostic whale genera is directly related to the amount of research output relating to fossil whales. So, the more researchers there are, the more species there are. Good, eh?

To minimise these problems, you should only really ever use the diversity recorded in genera rather than at the species level. But even this isn't fail safe. Whats in a name, after all? In whales, a recent study found that of 436 generic names that have been ever applied to fossil whales, only 56% of them are actually still in use. Of the 43% that are no longer in use, 18% were considered to be diagnosed wrongly due to their poor type material, and 19% are synonyms of other taxa. 6% were misspellings! How about that: inflated diversity due to bad spelling!

Even using the first appearence is prone to error, as the recent mass extinction paper did. Look how our estimates have changed for whales of their origin, based on the sudden finds of new specimens

Geological reasons

Perhaps the most important reason for our apparent diversity loss during the late Eocene is a geological reason. At this period, we have an event known as the Oi-1 glaciation event, a roughly 400,000 long glaciation. Whilst it might seem that an ice age could kill lots of whales, a much more likely reason for this drop in diversity is simply because during an ice age, sea levels drop significantly due to the volume of ice stored at the poles. As an example, at the Last Glacial Maximum*, there was about 3km of ice sitting on scandinavia, and this corresponds to about 120m drop of sea level. We can assume a similar size for the Oi-1 glaciation.

Sea level drops such as these are bad news for fossil deposition. For a start, we lose a significant amount of area that we can physically deposit fossils in, and some of the prime fossil deposition localities.

UK at the LGM and now. I think you'll agree there were probably few whales living in the channel at this time.

But, worse still, the change of the sedimentary environment is such that we can actively erode older sites where we formerly had fossils. So, though superficially it might seem strange, the diversity of an group can be strongly affected by an event that occured after the animal lived.

So, putting these factors aside, we still have an apparent increase in diversity towards the modern day. Usually, the trend is that the further we go back in the fossil record, the greater the chance is that the fossil will be destroyed by some sort of event. But we have something that is even more drastically not in our favour here; the Last Glacial Maximum. Again, this means that Pleistocene fossil whale diversity is most likely undersampled owing to the flooding of marginal marine deposits at the end of the last ice age. Who can tell whether the diversity is greater today or less than during the Pleistocene?

But despite all this...

There are, admittedly, a lot of biases at play in all meta-studies. But that doesn't mean that they are unimportant. In spite of all the above factors, we can still pull out a lot of important trends.

Uhen's chart with some of the major biases put on (by myself). There may be many more, and on the flip side, some of these may not affect estimates that greatly.

Even after putting the biases that we know about, we can tell that we had a huge wave of diversification in the Miocene, when most modern families evolved, and whale diversity hit its peak during the late middle Miocene, and that by the Pliocene, most if not all archaic families of whales had gone extinct, and diversity dropped toward modern levels. My title "Beware Meta-Studies" is intended as an imperative to readers as well as to the compilers of metastudies.

Even if you have taken into account as many possible biases as possible, we just need to make sure that we take meta-studies with a pinch of salt. Nothing in paleontology makes sense without the light of taphonomy, and you need to make sure that you can mitigate for all factors. Error bars are tricky to define! But one thing's for sure: by simply doing a literature search of when things appear and disappear, you can't get any grasp of the true diversity.


* Because there's still ice at the poles now, we're still in an ice age at the moment. So we call what is colloquially referred to as the ice age as the LGM.


Barnosky, A.D. et al., 2011. Has the Earth/'s sixth mass extinction already arrived? Nature, 471(7336), 51-57.

Uhen, M.D.&Pyenson, N.D., 2007. Diversity estimates, biases, and historiographic effects: resolving cetacean diversity in the Tertiary. Palaeontologia Electronica, 10(2), 11A.

Uhen, M.D., 2010. The origin (s) of whales. Annual Review of Earth and Planetary Sciences, 38, 189–219. 
Kidwell, S.M.& Flessa, K.W., 1995. The Quality of the Fossil Record: Populations, Species, and Communities. Annual Review of Ecology and Systematics, 26, 269-299.