For a time I thought p-values were an objective measure but then a couple of blows put to rest my dream on having an objective procedure to deal with uncertainty. This is the story of the Subjectivity one-two combo that knocked out down flat my Objectivity dreams…
The first blow against Objectivity
And thus one day I learned about the likelihood principle. Estating that the same data should drive any researcher to the same inferential decisions seemed reasonable and made me doubt about the p-value objectivity for a moment.
So yeah, all right, If you want to show that Frequentist Statistics do not hold the likelihood principle we can design two experiments using the same data with the same significance and have different inferential results.
So what? Admittedly the likelihood principle example is a bit contrived and unconvincing since it is hard to imagine a real situation where two scientists will disagree on the likelihood function to analyze the undocumented data of an experiment left by a third scientist.
But even if this was the case, when looked in detail, the purpose of Frequentist Statistics, or Error Statistics as Philosopher of Science Deborah Mayo would call it, is to establish a procedure which, when used over and over again, will guarantee an error rate defined by the significance of our experiments.
So it would be reasonable, and quite objective, that both scientist would agree that none of them know how the experiment was designed by the third scientist and, therefore, combining both p-values into one unique p-value would be warranted.
However, to be honest, it was quite annoying that two frequentist scientists might reach different conclusions with the same data. So though I found the example peculiar and made my belief in objectivity stumble it was still not enough to desist on my Objectivity dream.
Second blow & Objectivity Knock Out
However, let’s consider the following example based on my experience with analysis of historical data and that knocked out my belief in objectivity; let’s say we are team leaders of a group of two scientists and we ask them to investigate if jelly beans cause acne. Let’s also imagine that our two scientists disagree on how to design the experiment.
John wants to check whether green jelly beans cause acne since, in his scientific opinion, green is the only color that makes sense to check, Mary agrees that green jelly beans might cause acne but she also believes that there are other nineteen jelly bean colors that might also cause acne. So both scientists come to us and, as project leaders, we decide to give them freedom to design both experiments independently.
Mary, however, does not wants to waste money in a separate experiment for green jelly beans and tells John that she will replicate his exact experiment design for the other colors so that she can use with ease the data coming from his experiments with green jelly beans.
They both agree on running their experiments with an alpha significance of 0.05 and they obtain the following p-values for each individual color experiment:
John: 0.01 < 0.05
Mary: 0.35, 0.01, 0.04, 0.12, …, 0.65 being 0.01 the smallest p-value and with Bonferroni correction we have that all p-values are well above 0.05/20 = 0.0025 (fwer)
So John is quite excited about his results and rushes to present it to us claiming he has clear results showing that green jelly beans are linked to acne. (0.01 < 0.05)
Mary is also quite happy because she believes to have strong evidences against jelly beans green, or otherwise, to be linked to acne. (0.01 > 0.0025 fwer)
So what do we do now? They both are right and their the results are way above and way below their significance levels despite they both are considering the same data for the green jelly beans analysis.
The fact that this kind of situations happen to be very common when analyzing already existing data, and not just an artifact to prove a likelihood point, made my Objectivity belief to hit the ground… So that’s it.
Objectivity is dead, long live Objectivity!
Okay, both scientists will keep their respective and subjective error rates, no problem there but, as project leaders, now we have to make decisions. We could do a number of things to keep intact our error rates as a team:
- We could combine the p-values from both experiments (My choice)
- We could flip a coin to choose between John & Mary
- We could favor one scientist approach versus the other before we see the results
These options might not make John & Mary happy since they both stand behind their results, however, this would keep our team error rates unaltered even though all these options are as subjective as subjective were the choices John & Mary made.
Unfortunately we cannot always have the luxury to repeat over and over again an experiment to see if the results for green jelly beans really hold. Sometimes all we have is one shot and we need to make the most of it (e.g. the cosmic microwave background), and it seems only one person is allowed aim and pull the trigger.
Take another example; the Particle of God experiment in the LHC. Scientist A might request to look at energies only in the range around 125 GeV because his theory says so, however, Scientist B wants to look in the range around 125, 135 & 145 because his theory says so too.
When turns out the particle is around 125.09 GeV with 5 sigma the Scientist A has already achieved the prove he needs whereas Scientist B needs to keep smashing particles to achieve the same significance since he is checking a wider range of energies.
We could potentially re-recreate this situations with any other experiment so yeah, Objectivity is so dead. No point in keep crying about it, but this is just my opinion…