Nov 13 2013

Statisticians need to see how experimental science really works, then shut up about it

This kind of irks me.

The plague of non-reproducibility in science may be mostly due to scientists’ use of weak statistical tests, as shown by an innovative method developed by statistician Valen Johnson, at Texas A&M University in College Station.

The article goes on to specify an amazing new Bayesian test the person invented which shows that scientists are using "risky statistics". Ok, no.

Let's talk turkey for a minute here. This is how an experiment actually goes most of the time:

  1. Your boss has an idea for an experiment. It's based on some prior work which was kind of understood. That's what funding is given out for.
  2. You spend a ridiculous amount of time and energy setting up the equipment, fixing all the electronics, plumbing, and Labview problems.
  3. You do the experiment and see nothing.
  4. You continue fixing things.
  5. You finally get something. You optimize conditions, you collect data.
  6. You analyze the data, and it's hard to explain. It's noisy even when you did everything you could to make it good.
  7. Out of the hundreds of plausible explanations for the data, you choose one that your boss likes, and you write a paper. You write a paper because that's expected of you regardless of what the result was. You can't not publish a paper just because the data makes no sense.

The statistical test isn't the issue. The issue is that experiments are hard. Unless the phenomena are so well known that nobody would bother doing an experiment in the modern day, the equipment to measure them is experimental, home built, or being used in an unprecedented way by people with no technical training (grad students). You can sit and mangle the data all day, apply any number of tests, make your curve fits more robust, whatever. The chances are that your findings are going to be debunked or at the minimum refined. This is ok! But it means that cutting edge stuff isn't that reproducible.

Sure, I have encountered statistical fallacies in other people's work. But in none of those cases was the cause solely the ignorance of the researcher. Rather, it usually was because someone was trying to fit something that was unfittable because the data was extraordinarily hard to collect and therefore noisy and ambiguous. Or, someone quoting an error estimate based on a few data points, which is not good, but only because getting each of those data points took a month. If the reseacher had a ton of data, the statistics would be better because the statistics would be better. Then, and only then, should we start to delve into Bayesian notions of cause of the data.