Chapter 33 Two-way ANOVA Related Measures

library(tidyverse)
library(ez)
library(knitr)
library(kableExtra)
library(viridis)

A two-way ANOVA experimental design is one that involves two predictor variables, where each predictor has 2 or more levels. There is only one outcome variable in a 2 way ANOVA and it should be continuous, measured on an equal interval scale, and ideally sampled from a normally-distributed population.

The predictor variables are often referred to as factors, and so ANOVA designs are synonymous with factorial designs.

The list of two-way ANOVA experimental designs can be as follows:

  1. Completely randomized (CR) on both factors
  2. Related measures (RM) on both factors
  3. Mixed, CR on one factor and RM on the other

In one sense, a two-way ANOVA can be thought of as two one-way ANOVA’s run simultaneously. So it is possible to assess the effects of two factors at once. The major difference, however, is the ability to also test whether an interaction exists between the two factors.

In this chapter we’ll focus on the data structure and analysis of a two-way ANOVA RM experimental design. These are particularly common in cell-culture and other in vitro-based experiments, owing to the homogeneity of the biological material.

For an experiment such as the one described below, all of the biological material assayed within a single replicate comes from a common source. Every well in every plate or every tube in every rack prepared from that source is identical, for all intents and purposes. Therefore, any measurements taken from these wells are intrinsically-linked. For this reason, the data should be analyzed as related-measures.

A two-way ANOVA RM design viewed from the bench. Each combination of the two factors is tested in technical duplicate. All measurements are intrinsically-related.

Figure 33.1: A two-way ANOVA RM design viewed from the bench. Each combination of the two factors is tested in technical duplicate. All measurements are intrinsically-related.

An independent replicate occurs when the same protocol is followed, from start to finish, beginning with a new batch of plates or with a new batch of a purified protein or whatever.

Even so, it’s hard to argue these represent true biological replicates. The cells are likely clonal, and probably aren’t changing much from batch to batch. The statistical analysis is therefore essentially focused on establishing whether the observations occur repeatedly or not. The replicates are therefore “blocked,” usually in time. For example, an assay on the material on week1 is treated differently than one on week2 because these are two different time blocks.

33.1 Cell culture

Coronary artery atheromas are enriched in fast-growing mesenchymal-like cells, which have prominent inflammatory characteristics. They express the transcription factor NFAT, which is regulated by mitogenic receptor signaling to control inflammatory gene expression.

Cells from atheroma explants are easy to grow in culture, often indefinitely as for cancer cells. Eventually, a fast growing cell population dominates these monocultures when passaged serially. But these probably mimic the “bad” cells in an atheroma pretty well, so are useful to study.

An experiment was performed on cultured human coronary smooth muscle cells derived from an atheroma to determine if the bzip suppresser ICER attenuates NFAT and whether it is mitogen-selective.

One predictor variable is mitogen at 3 levels: vehicle, ADP and PDGF. A second predictor variable is an expression vector, which is either empty or encodes a cDNA to express ICER.

The output response, luminescence in cell extracts due to an NFAT-driven luciferase transcriptional reporter, is detected using a luminometer. Luminescence is linearly proportional to the amount of luciferase in the cells.

The protocol involved parallel transfections of the CASM cell line with the expression vectors and luciferase reporter, followed a few days later by treatment with the mitogens for 4 hrs before measuring luminescence. All treatments are performed in technical duplicate, from which the average value is used for the response measure.

Three predictions, and the corresponding null hypotheses that each tests, can be evaluated here simultaneously:

  1. Icer and its control differ in how they each affect NFAT-mediated transcription. \(H0:\sigma^2_{suppressor}\le\sigma^2_{residual}\)
  2. The mitogens differ in how they affect NFAT-mediated transcription. \(H0:\sigma^2_{mitogen}\le\sigma^2_{residual}\)
  3. NFAT-mediated transcription is influence by a combination of suppressor and mitogen. \(H0:\sigma^2_{suppressorXmitogen}\le\sigma^2_{residual}\)

Here’s some data:

replicate vector mitogen Nfatluc
A empty Veh 23.2
B empty Veh 15.7
C empty Veh 17.2
A empty ADP 83.5
B empty ADP 95.9
C empty ADP 89.0
A empty PDGF 124.6
B empty PDGF 187.7
C empty PDGF 170.9
A ICER Veh 16.8
B ICER Veh 5.4
C ICER Veh 27.6
A ICER ADP 89.8
B ICER ADP 80.2
C ICER ADP 52.5
A ICER PDGF 57.0
B ICER PDGF 78.1
C ICER PDGF 67.4
ggplot(hcsmc, 
       aes(
         mitogen, 
         Nfatluc, 
         group=replicate)
       ) +
  geom_line(
    
  ) +
  geom_point(
    aes(
      color=replicate), 
    size=4
    ) +
  facet_grid(
    ~vector, 
    switch="both"
    ) +
  scale_shape_manual(
    values=c(16, 16)
    ) +
  scale_color_viridis(
    discrete=T
    ) +
  theme(
    legend.position="top"
    ) +
  scale_x_discrete(
    limits=c("Veh","ADP","PDGF")
    ) +
  labs(
    y="NFAT-luciferase, light units/mg", 
    x="Mitogen"
    )
A cell culture-based two-way ANOVA RM experiment

Figure 33.2: A cell culture-based two-way ANOVA RM experiment

33.2 The test

Scientific judgement dictates running this test as related measures. Each replicate was performed on a different passage of cells, approximately 1 week apart. Within each passage the cells are highly homogeneous. The transfections and treatment and biochemical measurements were all conducted in a “side-by-side” block. The measurements within a replicate are intrinsically-linked. Perhaps most importantly, each measure taken on a given day is NOT independent of all other measures that day. But we can assume they differ from week to week. There really is no other way to analyze these data.

out.rm <- ezANOVA(data = hcsmc
                   , dv = Nfatluc
                   , wid = replicate
                   , within = c(vector, mitogen)
                   , type = 3
                   , return_aov = T
                   , detailed = T
  
); out.rm
## Warning: Converting "replicate" to factor for ANOVA.
## Warning: Converting "vector" to factor for ANOVA.
## Warning: Converting "mitogen" to factor for ANOVA.
## $ANOVA
##           Effect DFn DFd       SSn       SSd         F           p p<.05
## 1    (Intercept)   1   2 91378.125  388.5700 470.33031 0.002119408     *
## 2         vector   1   2  6156.801  471.0011  26.14347 0.036186976     *
## 3        mitogen   2   4 29018.893 1981.2567  29.29342 0.004084641     *
## 4 vector:mitogen   2   4  7333.031  623.0722  23.53830 0.006133042     *
##         ges
## 1 0.9634772
## 2 0.6399535
## 3 0.8933620
## 4 0.6791774
## 
## $`Mauchly's Test for Sphericity`
##           Effect         W         p p<.05
## 3        mitogen 0.6795372 0.8243404      
## 4 vector:mitogen 0.4963405 0.7045144      
## 
## $`Sphericity Corrections`
##           Effect       GGe      p[GG] p[GG]<.05      HFe       p[HF] p[HF]<.05
## 3        mitogen 0.7573102 0.01101064         * 2.620487 0.004084641         *
## 4 vector:mitogen 0.6650442 0.02123339         * 1.485468 0.006133042         *
## 
## $aov
## 
## Call:
## aov(formula = formula(aov_formula), data = data)
## 
## Grand Mean: 71.25
## 
## Stratum 1: replicate
## 
## Terms:
##                 Residuals
## Sum of Squares     388.57
## Deg. of Freedom         2
## 
## Residual standard error: 13.93862
## 
## Stratum 2: replicate:vector
## 
## Terms:
##                   vector Residuals
## Sum of Squares  6156.801   471.001
## Deg. of Freedom        1         2
## 
## Residual standard error: 15.34603
## 2 out of 3 effects not estimable
## Estimated effects are balanced
## 
## Stratum 3: replicate:mitogen
## 
## Terms:
##                   mitogen Residuals
## Sum of Squares  29018.893  1981.257
## Deg. of Freedom         2         4
## 
## Residual standard error: 22.25565
## 2 out of 4 effects not estimable
## Estimated effects may be unbalanced
## 
## Stratum 4: replicate:vector:mitogen
## 
## Terms:
##                 vector:mitogen Residuals
## Sum of Squares        7333.031   623.072
## Deg. of Freedom              2         4
## 
## Residual standard error: 12.48071
## Estimated effects may be unbalanced

33.3 Interpretation of the output

Note here that we are NOT capturing an aov object. The only use of the aov object we might have is for posthoc testing with range tests. But we should not use range tests on a pure related measures design, where the means of groups are irrelevant.

33.3.1 ANOVA Table

Note just as for the 2-way ANOVA CR analysis, there are 3 F tests for the model (the intercept F test is inconsequential for now). The F value for the suppress:mitogen interaction is beyond the critical limit for a null F distribution with 2 and 4 degrees of freedom. The result is extreme (p = 0.00613). The interaction takes precedence, since the main effects are difficult to interpret if an interaction occurs.

It is safe to reject the interaction null and conclude that variance for its effect exceeds that for its residual. Scientifically, this means that levels of NFAT-mediated transcription are influenced by both the suppressor and the mitogen stimuli, as designed in this experiment. About 68% of the variance in the data can be explained by this interaction effect.

33.3.2 Mauchly’s Sphericity Test

Sphericity is defined as uniform variance of the differences. Think of it as the RM analog to the CR homogeneity of variance assumption. Sphericity, as for homogeneity of variance, is an assumption ideally met in RM ANOVA for validity of the result.

Mauchly’s tests the null hypothesis that the variances among the differences are equal. If the test statistic were extreme, the null would be rejected, meaning these variances are unequal and sphericity is violated. If that were the case, we’d conduct inference on the basis of a corrected p-value for the ANOVA..the p[GG], which is the Geisser-Greenhouse corrected p-value.

In our analysis, the Mauchly test is not extreme. We have no reason to believe the sphericity assumption is not satisfied. We can use the p-value in the ANOVA table to guide our inference, without using the sphericity correction.

33.4 Post Hoc multiple comparisons

The true scientific objective here is to know whether the suppressor selectively attenuate NFAT-mediated transcription by either of the mitogens. In other words, scientifically, it would be interesting to learn whether an ICER-suppressible factor participates in the pathway of one of the mitogens, but not the other.

Given that objective, there are some fairly specific planned comparisons worth making. If we compare ICER to Empty for each level of mitogen, we’ll have an answer to the question posed above. In other words, we only care about 3 of the 15 possible comparisons that could be made.

Therefore, we’ll run a pairwise.t.test without creating any adjusted p-values. After that, we’ll create a small vector of p-values for the 3 comparisons we wish to make and use the p.adjust function to correct them for multiple comparisons.

Step1: Run the pairwise.t.test but don’t adjust p-values

We use a paired=T argument given the intrinsically-linked relationship of the measures in the samples. Not all of the measures are independent so the unpaired t-test is inappropriate.

m <- pairwise.t.test(hcsmc$Nfatluc, interaction(hcsmc$vector, hcsmc$mitogen), paired=T, p.adj = "none")
m
## 
##  Pairwise comparisons using paired t tests 
## 
## data:  hcsmc$Nfatluc and interaction(hcsmc$vector, hcsmc$mitogen) 
## 
##            empty.ADP ICER.ADP empty.PDGF ICER.PDGF empty.Veh
## ICER.ADP   0.3413    -        -          -         -        
## empty.PDGF 0.0439    0.0803   -          -         -        
## ICER.PDGF  0.0129    0.6800   0.0191     -         -        
## empty.Veh  0.0066    0.0316   0.0214     0.0276    -        
## ICER.Veh   0.0147    0.0720   0.0215     0.0429    0.7723   
## 
## P value adjustment method: none

Step2: Collect the p-values only for the comparisons of scientific interest

pv <- c(m$p.value[1,1], m$p.value[3,3], m$p.value[5,5]) 
pv
## [1] 0.34127356 0.01905007 0.77231675

Thus, the three comparisons and their unadjusted p-values are:

  • ICER.ADP to empty.ADP p=0.3413
  • ICER.PDGF to empty.PDGF p=0.0191
  • ICER.Veh to empty.Veh p=0.7723

Step3: Adjust those p-values for multiple comparisons

p.adjust(p=pv, method="bonf", n=length(pv))
## [1] 1.00000000 0.05715022 1.00000000

Each of these adjusted p-values is greater than the type1 error threshold of 5%. On this basis, we cannot reject the null hypothesis that these response are the same.

This illustrates the occasional case where a positive result at the level of the omnibus test is coupled to a negative finding in post hoc testing. In fact, that shouldn’t be a surprise. The F test sniffed out some decent sized effects within the collection of responses that are not of scientific interest.

33.5 Write Up

Statistical analysis of the luciferase data indicates there is an interaction between levels of the suppressor and the mitogen variables (2 way RM ANOVA, p=0.0061, n=3). However, post hoc comparisons do not support the hypothesis that ICER selectively suppresses either of the mitogens (pairwise two-sided paired t-tests, Holm adjusted p-values for multiple comparisons). The experiment may have been underpowered.

33.6 Summary

We use two-way repeated measures ANOVA (both arms RM) when all measurements within each replicate are intrinsically-linked. When the experiment involves in vitro biological material there is a very good chance this ANOVA is the right choice. Inference can be done on only the main effect(s). When the interaction is positive, the main effects are difficult to interpret. Follow-up with pairwise.t.tests (paired=T), not with range tests.