Chapter 30 One-way ANOVA Completely Randomized

library(magrittr)
library(tidyverse)
library(ggformula)
library(DescTools)
library(ez)
library(lsr)

You probably should read about the ANOVA big picture).

30.1 Using ezANOVA

R has several functions to run ANOVA. We are mostly going to use ezANOVA from the ez package in this course. The syntax for defining the ANOVA analysis is a bit more clear in ez, especially if we are new to ANOVA.

This chapter runs through an analysis of a one-way completely randomized ANOVA data set as ‘how to’ example.

30.2 The chickwt data set

This data set is in R’s base. It compares the influence of 6 different types of food sources on chick weight. We have one predictor variable, the factor ‘feed,’ which is tested at 6 different levels (the various food sources).

There is one continuous outcome variable (weight). We assume the chicks are out bred, and thus not intrinsically-related, and that they have been randomly assigned to a level of feed, and have been weighed after a period of time consuming that feed.

This is a classic one-way completely randomized ANOVA design.

This chapter illustrates how to go through an analysis of the data.

30.2.1 Inspect the data

The next few scripts involve inspecting the data set, which should always be done prior to running any statistical tests.

data(chickwts)
#take a look at the data structure, depending upon how you like to view data
str(chickwts)
## 'data.frame':    71 obs. of  2 variables:
##  $ weight: num  179 160 136 227 217 168 108 124 143 140 ...
##  $ feed  : Factor w/ 6 levels "casein","horsebean",..: 2 2 2 2 2 2 2 2 2 2 ...
chickwts
##    weight      feed
## 1     179 horsebean
## 2     160 horsebean
## 3     136 horsebean
## 4     227 horsebean
## 5     217 horsebean
## 6     168 horsebean
## 7     108 horsebean
## 8     124 horsebean
## 9     143 horsebean
## 10    140 horsebean
## 11    309   linseed
## 12    229   linseed
## 13    181   linseed
## 14    141   linseed
## 15    260   linseed
## 16    203   linseed
## 17    148   linseed
## 18    169   linseed
## 19    213   linseed
## 20    257   linseed
## 21    244   linseed
## 22    271   linseed
## 23    243   soybean
## 24    230   soybean
## 25    248   soybean
## 26    327   soybean
## 27    329   soybean
## 28    250   soybean
## 29    193   soybean
## 30    271   soybean
## 31    316   soybean
## 32    267   soybean
## 33    199   soybean
## 34    171   soybean
## 35    158   soybean
## 36    248   soybean
## 37    423 sunflower
## 38    340 sunflower
## 39    392 sunflower
## 40    339 sunflower
## 41    341 sunflower
## 42    226 sunflower
## 43    320 sunflower
## 44    295 sunflower
## 45    334 sunflower
## 46    322 sunflower
## 47    297 sunflower
## 48    318 sunflower
## 49    325  meatmeal
## 50    257  meatmeal
## 51    303  meatmeal
## 52    315  meatmeal
## 53    380  meatmeal
## 54    153  meatmeal
## 55    263  meatmeal
## 56    242  meatmeal
## 57    206  meatmeal
## 58    344  meatmeal
## 59    258  meatmeal
## 60    368    casein
## 61    390    casein
## 62    379    casein
## 63    260    casein
## 64    404    casein
## 65    318    casein
## 66    352    casein
## 67    359    casein
## 68    216    casein
## 69    222    casein
## 70    283    casein
## 71    332    casein

It is also helpful to calculate some descriptive stats for inspection. Since the design is obviously about testing the different levels of feed, use the group_by function of tidyverse to summarize by feed level. The kable function from knitr makes a nicer output.

cw1 <- chickwts %>%
  group_by(feed) %>%
  summarise(
    mean= mean(weight),
    median=median(weight),
    sd= sd(weight),
    n = n(),
    var=var(weight)
    )
knitr::kable(cw1, caption="Descriptive statistics for the chickwts dataset.")
Table 30.1: Descriptive statistics for the chickwts dataset.
feed mean median sd n var
casein 323.5833 342.0 64.43384 12 4151.720
horsebean 160.2000 151.5 38.62584 10 1491.956
linseed 218.7500 221.0 52.23570 12 2728.568
meatmeal 276.9091 263.0 64.90062 11 4212.091
soybean 246.4286 248.0 54.12907 14 2929.956
sunflower 328.9167 328.0 48.83638 12 2384.992

We can see that the means and medians are about equal within each group. That’s a quick way to suggest no skew. Overall sample size is 71, distributed as 10-14 replicates per group, it’s a little unbalanced. But within acceptable limits. The variances are not equivalent, but are they unequal? Levene’s test in the ANOVA will provide that answer. But this is a pretty small sample size. Whatever that output we will take it with salt.

Plot the data for a look. Simply looking at the data graphically goes a long way to ensuring this is a one-way ANOVA design. Jitter plots are a great way to see group data like this. I like the crossbar geom to overlay some summary stats.

ggplot(chickwts, aes(feed, weight))+
  geom_jitter(width = 0.2, size=2) +
  stat_summary(fun.data = mean_sdl, 
               fun.args = list(mult=1), 
               geom="crossbar", 
               width=0.2, 
               color="red"
               ) + 
  theme_classic()
## Warning in (function (kind = NULL, normal.kind = NULL, sample.kind = NULL) :
## non-uniform 'Rounding' sampler used

30.3 Run the ANOVA

We can imagine designing an experiment like this with either of 3 experimental objectives in mind.

  1. Perhaps we’re interested mostly in whether any feeds are better than others for achieving weight gain? We could answer that by making all possible pairwise comparisons. Since there are 6 levels of the factor feed, that would involve \(m=\frac{6(6-1)}{2}=15\) comparisons.

  2. Imagine casein is the standard feed, and we wish to know if any of the other feeds differ from this standard? We would compare casein to every feed. That would involve only 5 comparisons.

  3. Perhaps we just want to know if any of the feeds differ in causing weight gain, but we aren’t interested in which specific feeds differ? We could answer that question using the F-test result, and not comparing any groups post hoc.

Each of those objectives are scientifically-driven. They should be declared before running an experiment so that an unbiased analysis is conducted after the data are in.

Other than, perhaps, how we order our data in the data set, which of these objectives is true doesn’t influence how we run the ezANOVA function per se. However, the objective will influence which post hoc analysis is performed.

30.3.1 Run the chickwts One Way ANOVA

First, ezANOVA requires a ‘wid,’ which is a unique ID variable for each independent replicate.

We need to add one to the chickwts data set. Since all the measures are independent, we’ll just do that by row number. At the same time we’ll convert the integer to a factor so ezANOVA won’t bark at us.

chickwts$ID <- as.factor(1:nrow(chickwts))

You should look at R’s help for ezANOVA ?ezANOVA to understand these test arguments. The help page is pretty clear for most of these.

Since we don’t use the term ‘dependent variable’ much in this course, to be clear, ‘dv’ is the outcome response variable..the dependent variable. We have to specify it in the ezANOVA arguments.

If measurements for levels of the factor are not intrinsically-linked, if they are distributed to each replicate independently, the design is completely randomized. That factor should be listed in the function using a between argument.

If measurements for levels of the factor are intrinsically-linked, it is a related/repeated measures design. List it as a ‘within’ argument, rather than ‘between.’

Here, the feed factor is between. Every chick was randomly assigned a level of feed.

Notice that ezANOVA is a function. Use it to create a list object called my.ezaov, which has all of the output information. We can call all of the output at once, or we can call specific elements from the my.ezaov object to see the results.

my.ezaov <- ezANOVA(
            data = chickwts, 
            wid = ID, 
            dv = weight, 
            between = feed,
            type = 2, 
            return_aov = T, 
            detailed = T)
## Warning: Data is unbalanced (unequal N per group). Make sure you specified a
## well-considered value for the type argument to ezANOVA().
## Coefficient covariances computed by hccm()
my.ezaov
## $ANOVA
##   Effect DFn DFd      SSn    SSd       F           p p<.05       ges
## 1   feed   5  65 231129.2 195556 15.3648 5.93642e-10     * 0.5416855
## 
## $`Levene's Test for Homogeneity of Variance`
##   DFn DFd      SSn      SSd         F         p p<.05
## 1   5  65 4389.241 76154.92 0.7492639 0.5896095      
## 
## $aov
## Call:
##    aov(formula = formula(aov_formula), data = data)
## 
## Terms:
##                     feed Residuals
## Sum of Squares  231129.2  195556.0
## Deg. of Freedom        5        65
## 
## Residual standard error: 54.85029
## Estimated effects may be unbalanced
# my.ezaov$ANOVA, this is a dataframe
# my.ezaov$Levene, this is also a dataframe
# my.ezaov$aov, this is an aov object that we can pass into posthoc functions. 

30.3.2 Interpreting the One-Way CR ANOVA Output

The ezANOVA output prints 3 list objects by default:

  • $ANOVA (which is the first data frame)
  • $Levene's Test for Homogeneity of Variance (which is the 2nd data frame)
  • $aov (which is the end of the console output and is an important statistical object)

In fact, there is a great deal more computed that is not printed, which you can visualize in the console by typing str(my.ezaov).

30.3.2.1 $ANOVA: The ANOVA table

For a CR one way ANOVA design, the SS are partitioned as follows, in general: \(SS_{total}=SS_{model}+SS_{residual}\).

In this example, \(SS_{model}= SS_{feed}\).

Thus, the ANOVA table summarizes the feed model.

The DFn = 5 corresponds to the 6 groups, less 1 degree of freedom (one is lost to calculate mean of groups (sort of)) for the model source of variance.

The DFn = 65 corresponds to the degrees of freedom for the residuals (one df is lost per group to calculate group means).

Therefore, this ANOVA tests a feed model against a null F distribution with 5 and 65 degrees of freedom.

\(F=MS_{feed}/MS_{residual}=15.3648\), where \(MS = SS/df\). The SS can be found in the $aov output.

ges is an effect size parameter for ANOVA. For this particular experimental design, \(ges=\frac{SS_n}{SS_n+SS_d}\). In other words, ges summarizes the variation associated with the model as a fraction of the total variation in the data.

Thus, 54.16% of the variation in weight is attributable to the different levels of feed in the experiment. In other words, the model explains 54.16% of the variation in the data.

Think of eta-squared, partial eta-squared, and generalized eta-squared as all related to the more commonly understood \(R^2\), the so-called coefficient of regression. They are each calculated differently, but are related as estimates for how much of the variation is due to the model. ges takes on values from 0 to 1. Higher values indicate a greater degree of the overall variation is due to the factor tested in the experiment.

Having said that, it’s a bit of a Goldilocks statistics by itself. It has more value as a way to describe fits of nested models.

30.3.2.2 $aov

This table is an important object because it can be passed into certain posthoc tests, facilitating analysis.

It provides the accounting for the sum of squares and degrees of freedom, while calculating the residual standard error. It is somewhat redundant with the $ANOVA table, though the residual standard error can come in handy.

  • DFn=degrees freedom for numerator. k-1, where k = levels of factor.
  • DFd=degrees freedom for denominator. n-k, where n = number of independent replicates.
  • SSn & SSd = sum of squares for model and residual, respectively
  • Residual standard error is a parameter that estimates the precision by which the data fit the model, and is in units of the outcome variable, weight. \(SE\) is the square root of the residual variance: \(S_{y.x}=\sqrt{\frac{SS_{residual}}{df_{residual}}}\)

If \(S_{y.x}\) were zero, there would be no residuals. The data points would all rest at the value of the group means. The data would fit perfectly to a model of 6 group means at their observed values. \(S_{y.x}\) therefore is a descriptive statistic that declares how much error, or the degree by which the data is unexplained by the model. It has some utility for calculating confidence intervals and power analysis as well.

30.3.2.3 The F test

The scientific prediction for this experiment is that chick weights will vary depending upon the type of feed they are grown on. The null is that their weights will be roughly the same, irrespective of food source.

ANOVA tests this hypothesis through the variance parameter. The question at hand is whether the variance associated with the model, one of 6 different feed group means, is fractionally greater than the residual variance in the sample.

The null statistical hypothesis is that the variance associated with the different levels of feed is less than or equal to the residual variance. Therefore, the alternate hypothesis is the variance associated with feed is greater than residual variance.

\(H_0: MS_{feed}\le MS_{residual}\), \(H_1: MS_{feed}>MS_{residual}\)

Because of the relationship of group means to variance, it is just as valid to express the null hypothesis in terms of the group means, and that can be proven mathematically by a competent statistician (of which I am not):

\(H_0: \mu_a=\mu_b=\mu_c=\mu_d=\mu_e=\mu_f\)

Though, strictly, rejecting the null doesn’t mean that all group means differ from each other, it just means that some of them differ. \(H_1: \mu_a\ne\mu_b\ne\mu_c\ne\mu_d\ne\mu_e\ne\mu_f\)

The F statistic of 15.3648 is extreme for a null F distribution of 5 and 65 degrees of freedom. The very low p-value illustrates this extremeness.

We can reject the null and conclude that differences in effect on chick weights exist between this group of feeds.

30.3.2.4 Levene’s test for homogeneity of variance

Levene’s test determines whether there is a substantial level of differences in variance between groups. Levene’s test is run as a check to determine if the groups variance is homogeneous, as homoskedasticity is one of the validity assumptions of ANOVA.

Levene’s test statistic is calculated as follows:

\[W=\frac{(n-k)}{(k-1)}\frac{\sum\limits_{i=1}^{k}n_i(\bar Z_i-\bar Z)^2}{\sum\limits_{i=1}^{k}\sum\limits_{j=1}^{n_i}(Z_{ij}-\bar Z_i)^2}\]

where \(Z_{ij}=|x_{ij}-\bar x_i|\) and \(Z_i\) are the group means and \(\bar Z\) is the overall mean of \(Z_{ij}\).

The null hypothesis of the Levene test is rejected when \(W>F_{(\alpha,\ k-1,\ n-k)}\), where the F is the critical value.

Levene’s test output is a 2nd ANOVA table, and can easily be confused with the ANOVA output. Levene’s test lacks a \(ges\) parameter, nor does it have a column that lists the factor name.

If the Levene’s F value is low and the p-values is high, as is the case here, we can’t reject the null that the variances are the same. In this way, the variance homogeneity assumption is validated.

If this were not the case, we have two options.

Option 1: Simply ignore the result. The luck of the draw with small samples can explain group differences in variance, where none really exists. It is hard to gauge the impact of any one violation on our inference. It may be very modest or it may be substantial. With data in hand, it is too late to come up with an on-the-fly solution if not specified in the planning stages.

Option 2: Transform the data to homogenize outliers and variance, or switch the analysis to a Kruskal-Wallis nonparametric test.

30.4 Post hoc pairwise comparisons

When the ANOVA F test for the factor is extreme we may be interested in knowing which treatments differ. That’s achieved by conducting post hoc analysis. These typically involves multiple group comparisons.

There are two fundamental options for CR posthoc testing: p-value adjustment or a range test. Each are illustrated below, but only one method should be conducted in real life. That method is chosen in advance during the planning stages.

For the adjusted p-value method, use the pairwise.t.test function set up a matrix of all possible group comparisons. The Bonferroni p-value adjustment procedure is selected for best possible control of type1 error. This may miss some true differences.

For each comparison we are testing the null hypothesis that the two group means are the same: \[H_0: \bar y_i = \bar y_j\]

allPairs <- pairwise.t.test(chickwts$weight, chickwts$feed, paired=FALSE, alternative="two.sided", pooled.sd=TRUE, p.adjust= "bonf")
allPairs
## 
##  Pairwise comparisons using t tests with pooled SD 
## 
## data:  chickwts$weight and chickwts$feed 
## 
##           casein  horsebean linseed meatmeal soybean
## horsebean 3.1e-08 -         -       -        -      
## linseed   0.00022 0.22833   -       -        -      
## meatmeal  0.68350 0.00011   0.20218 -        -      
## soybean   0.00998 0.00487   1.00000 1.00000  -      
## sunflower 1.00000 1.2e-08   9.3e-05 0.39653  0.00447
## 
## P value adjustment method: bonferroni

To quickly scan which comparisons are below the p < 0.05 threshold we apply a simple custom extreme function across the matrix:

extreme <- function(x){
  ifelse(x < 0.05, TRUE, FALSE)
}

apply(allPairs$p.value, c(1, 2), extreme)
##           casein horsebean linseed meatmeal soybean
## horsebean   TRUE        NA      NA       NA      NA
## linseed     TRUE     FALSE      NA       NA      NA
## meatmeal   FALSE      TRUE   FALSE       NA      NA
## soybean     TRUE      TRUE   FALSE    FALSE      NA
## sunflower  FALSE      TRUE    TRUE    FALSE    TRUE

With the Bonferroni correction we are able to reject 8 of the 15 null hypotheses. For each comparison corresponding to a value of TRUE we can reject the null and conclude that their means are not equivalent.

Adjusting p-values for subsets of comparisons

Often, we don’t want to burn so much type1 error making scientifically uninteresting comparisons. In such cases, we instead want to compare subsets. For example, perhaps all we wanted to do was compare each of the feeds to casein.

Here’s a three step procedure for doing just that.

Step1: First, run the pairwise.t.test function, setting the argument p.adjust="none". The output includes a matrix of p-values we’ll name allPairsn, providing all possible comparisons.

#just repeating from above
allPairsn <- pairwise.t.test(chickwts$weight, chickwts$feed, p.adjust= "none")

Step2: Select from the allPairs matrix only the p-values that correspond to the comparisons you’d like to make. Name that vector of unadjusted p-values, selectPairs. This takes a bit of cleverness depending on what you want to grab from the matrix.

For example, we only want to compare all of the diets to casein. The comparisons we want are all in the first column. Use your matrix indexing skillz to grab only the unadjusted p-values from that first column:

selectPairsn <- allPairsn$p.value[, 1]
selectPairsn
##    horsebean      linseed     meatmeal      soybean    sunflower 
## 2.067997e-09 1.493344e-05 4.556672e-02 6.654079e-04 8.124949e-01
selectPairsn < 0.05
## horsebean   linseed  meatmeal   soybean sunflower 
##      TRUE      TRUE      TRUE      TRUE     FALSE

Step3: Now pass these unadjusted p-values in the selectPairs vector into the p.adjust function.

The output of this step is a vector of adjusted p-values for the selected group of comparisons.

adjustedPvalues <- p.adjust(selectPairsn, method="bonferroni")
adjustedPvalues
##    horsebean      linseed     meatmeal      soybean    sunflower 
## 1.033998e-08 7.466720e-05 2.278336e-01 3.327039e-03 1.000000e+00

Which of these are extreme? If it’s not clear by inspection (or too large), use a simple Boolean:

adjustedPvalues < 0.05
## horsebean   linseed  meatmeal   soybean sunflower 
##      TRUE      TRUE     FALSE      TRUE     FALSE

Although the p-values differ in this selected group compared to the full matrix, the inference remains the same. We can conclude that chick weights on horsebean, linseed and soybean feeds differ from that on casein feed.

30.4.1 Range tests

All range tests operate very differently from p-value adjustment methods.

Range tests compare each of a data set’s differences between group means to a critical value for the difference between two group means, which is calculated based upon the number of groups and their sample sizes. Any differences between group means that exceed the critical value difference are deemed “significant.” In addition, these functions calculated adjusted p-values and adjusted confidence intervals given the method selected.

One-way completely randomized ANOVAs, as opposed to related measures ANOVA, lend themselves well to range tests because the posthoc questions revolve around the differences between group means. Range tests are nice because they compute confidence intervals adjusted for multiple comparisons, in addition to producing p-values adjusted for multiple comparisons.

But these should be avoided on related measures ANOVA, due to the fact that range tests are designed for unpaired comparisons, not for paired comparisons.

30.4.1.1 Dunnett’s test

In some designs we are frequently interested in a comparison back to a control value. These are referred to as dependent comparisons, because every comparison is back to the same group mean. Dunnett’s test was created for exactly this type of situation.

Let’s imagine the researcher is interested in knowing if any of the feeds cause difference in chick weights compared to the casein feed.

DunnettTest(weight ~ feed, control="casein", data = chickwts)
## 
##   Dunnett's test for comparing several treatments with a control :  
##     95% family-wise confidence level
## 
## $casein
##                         diff     lwr.ci     upr.ci    pval    
## horsebean-casein -163.383333 -223.95852 -102.80815 6.3e-09 ***
## linseed-casein   -104.833333 -162.58951  -47.07716 8.6e-05 ***
## meatmeal-casein   -46.674242 -105.72847   12.37999  0.1670    
## soybean-casein    -77.154762 -132.81000  -21.49952  0.0032 ** 
## sunflower-casein    5.333333  -52.42284   63.08951  0.9995    
## 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Note how the R method is to subtract the casein control from the test groups. We can reject the null that chick weight on casein is the same as on horsebean, linseed and soybean. There is no evidence that chick weight differs on meatmeal and sunflower compared to casein.

30.4.1.2 Reporting the result

If you have CI’s, flaunt them.

When using Dunnett’s there is no need to report both the 95% CI and the p-value, since they effectively show the same thing (any adjusted p-value above 0.05 will also have a 95% CI that includes the value of zero). We use either for inference.

“Chick weights differ on feed type (one-way completely randomized ANOVA, F(5, 65)=15.36, p=5.9e-10). Specifically, posthoc Dunnett’s test show group mean chick weights differ between casein compared to horsebean (-163, adjusted 95%CI -223 to -102), linseed(-104, adjusted 95%CI -162 to -42), and soybean (-77,adjusted 95%CI -132 to -21)”

30.5 Summary

One-way completely randomized ANOVA is for experiments where every measurement represents an independent replicate. The results of the F test for the main effect tell us whether or not your one factor, no matter how many levels, has an effect on the response. We choose a posthoc analysis only if interested in knowing which levels of the factor might explain the positive F test. We can use either range tests or p-value adjusted pairwise.t.tests in posthoc *The range test functions provide effect sizes and adjusted confidence intervals