Chapter 5 What is your philosophy?

New PhD students are in a transition state.

They have been good knowledge consumers. Otherwise they wouldn’t have been accepted into a highly competitive Ph.D. program.

Creating an original contribution to knowledge is the main condition for earning a Ph.D. So the training is mostly to force their transition from a knowledge consumer into a knowledge producer.

The two require different toolkits.

A knowledge consumer might use statistics as only rote process…that there are right ways and wrong ways to do things. They might say, “Tell me the rules and procedures and I’ll just follow them.” Or they might ask, “tell me about all the tests that work better than the others?”

A knowledge producer will want to go deeper. They will crack open the computer to simulate data sets to see how tests perform under different assumptions. They may even change the process, adapting rules and procedures that fit there philosophy better.

A knowledge producer makes adjustments while defending their approach.

In this course I am going to hammer home the point, over and over, that we have a responsibility to think critically about your statistical practice. In a statistical experimental design framework, our intentions and our data generation process matter. How we think about the analysis matters. The statistical testing and inference is the culmination of a many step process.

5.1 Our philosophy matters

To a large extent, chosen experimental design and analysis procedures are driven by our philosophical approach to science. From time to time we need to lean on philosophy when called to defend our decisions. For example, when interpreting results, or in rebutting a reviewer, or when deciding to move forward or to stop some project).

When I say philosophy I’m thinking in terms of the fundamental ideas. Do you think truth is absolute and attainable, or is it provisional? What is the nature of evidence? What are the merits of inductive and deductive reasoning and how much weight should we give to each? How should you test hypotheses, through affirmation or falsification? Do I lean more Popperian (deduction, falsification) than Carnapian (induction, confirmation)?

What are my biases? Does my workflow have integrity?

Am I a perfectionist or a dontgiveafuctionist or somewhere in between?

It is a big, diverse world. Reasonable people operate across philosophical and behavioral divides. These questions have a lot of gray areas. To see what’s out there, as an entry point head over to the Stanford Encyclopedia of Philosophy, starting with some of the science content.

My sense is that most of us biomedical scientists are hybrids. We discover using a mixture of inductive and deductive practices. The most process seems to be making observations before establishing models before seeking confirmation in related experiments. This is both inductive and empirical. There are experiments that test predictions based upon some model. These generate observations that require deductive reasoning.

5.2 Exploration and uncertainty

What everyone seems to agree upon is that creating knowledge occurs through exploration. And to explore means we have to deal with uncertainty.

Uncertainty is the domain of statistics.

By what criteria can we ensure we are minimizing mistakes? By what criteria will we validate an observation? How will we ensure a hypothesis has been well-tested? The answers to this group of questions are found in inferential statistics, which in turn use probability.

Probability is the mathematics of uncertainty. Probability provides a way to keep score when we’re working near the forefront of knowledge, confronted by improbable ideas and uncertain data.

It turns out that there are a couple of statistical frameworks that people tend to apply to derive probability.

One decision you’ll have to make is to choose which of these frameworks to apply for our work: Bayesian statistics or the Fisher/Neyman-Pearson based error statistics, aka “frequentism?”

In the Bayesian framework, which is the older of the two, the rules of probability are used to represent the plausibility of a hypothesis under the observed data. In the Fisher/Neyman-Pearson framework, probability is used to assess the plausibility of the data under a null hypothesis.

Equally reasonable people choose either framework. If we search for “Bayesian v frequentism” we run smack dab into what are called the statistics wars. In one or two clicks you are sure to stumble into impassioned arguments on each side. Perhaps one will resonate.

knitr::include_graphics("images/statistics_wars.gif")
When you wander into a chat between a frequentist and a Bayesian.

Figure 5.1: When you wander into a chat between a frequentist and a Bayesian.

Which is better? I’m honestly agnostic. I learned, practice and teach error statistics, or something along the lines of the Fisher/Neyman-Pearson school. But I definitely appreciate the validity of Bayesian approaches in many circumstances.

That sounds wishy-washy so here is a hill that I will stand on. It is less important that one framework might be better than the other. What’s more important is to operate within a framework, for which you fully understand its strengths and limitations.

This course advocates adopting a framework that speaks to replicable experimental practices, to variables and models, to the implications of evidence, and also to a reproducible workflow. The inferential statistics you will learn are mostly frequentist. That choice is motivated for several reasons that I won’t go into here, except for a nod towards a philosophical reason that makes a lot of sense to me, as spoken by Deborah Mayo:

“In the severe testing view, probability arises in scientific contexts to assess and control how capable methods are at uncovering and avoiding erroneous interpretations of data. That is what it means to view statistical inference as severe testing. A claim is severely tested to the extent it has been subjected to and passes a test that probably would have found flaws, were they present. The probability that a method commits an erroneous interpretation of data is an error probability. Statistical methods based on error probabilities I call error statistics. It is not probabilism or performance we seek to quantify, but how severely probed claims are.” -Mayo