Ed230B/C

Analysis of Covariance


Analysis of Covariance

  • Controling for an unwanted nuisance variable -- statistical control.
  • An alternative to blocking for controling extraneous sources of variability.
  • In ANOVA terminology, a covariate is a continuous independent variable.

    Linear Model

    Hypotheses

    Assumptions

    1. Independence.
    2. Normality.
    3. Homogeneity of Variance.
    4. Population within-group regression coefficients are equal. Homogeneity of regression coefficients.

    5. Regression residuals are NID with mean 0 and equal variances.
    6. Relationship between the covariate and the dependent variable is linear.
    7. Covariate is measured without error.
    8. * Covariate is related to the dependent variable but is independent of the treatment.

    Selecting a Covariate

    1. One or more extraneous variables which effect the dependent variable but are irrelevant to the objectives of the experiment.
    2. Experimental control is not possible or not feasible.
    3. Covariate is independent of the categorical independent variable
      1. collected prior to the presentation of the treatments.
      2. collected after treatments but before they take effect.
      3. assume treatment is not affected by the covariate.

    Schematic with Example Data

    a1a2a3a4
    Y   XY   XY   XY   X
    3   42
    6   57
    3   33
    3   47
    1   32
    2   35
    2   33
    2   39
    4   47
    5   49
    4   42
    3   41
    2   38
    3   43
    4   48
    3   45
    7   61
    8   65
    7   64
    6   56
    5   52
    6   58
    5   53
    6   54
    7   65
    8   74
    9   80
    8   73
    10   85
    10   82
    9   78
    11   89

    ANCOVA Summary Table

    SourceSS   dfMSFError Term
    1Covariate33.950133.950130.09[3]
    2A1.79330.5982.29[3]
    3Error7.047270.261
    Adj Total8.84030
    Grand Total235.50031

    Compare with this ANOVA Summary Table

    SourceSSdfMSF
    A194.5364.83344.28
    Error41.0281.464
    Total235.531

    Table of the F-distribution

    Comparing ANCOVA with Randomized Block Designs

  • Inspect correlation between covariate and the dependent variable.
  • RB better when r < 0.4
  • ANCOVA and RB about equal when .4 < r < .6
  • ANCOVA better when r > .6

    Some Stata Tricks

    One Factor Design with one Covariate:
    anova y aanalysis of variance
    anova y x aanalysis of covariance
    anova y x a x*atests homogeneity of slopes
    Two Factor Design with One Covariate:
    anova y a b a*banalysis of variance
    anova y x a b a*banalysis of covariance
    anova y x a b a*b x*a*btests homogeneity of slopes
    One Factor Design with Two Covariates:
    anova y a analysis of variance
    anova y x z a analysis of covariance
    anova y x a x*ahomogeneity of x slopes
    anova y z a z*ahomogeneity of z slopes
    Two Factor Design with Two Covariates:
    anova y a b a*banalysis of variance
    anova y x z a b a*banalysis of covariance
    anova y x a b a*b x*a*bhomogeneity of x slopes
    anova y z a b a*b z*a*bhomogeneity of z slopes
    Note: Don't forget the cont option in the ancova

    Stata Example

    input x y a x1 x2 x3
    42  3 1  1  1  1
    57  6 1  1  1  1
    33  3 1  1  1  1
    47  3 1  1  1  1
    32  1 1  1  1  1
    35  2 1  1  1  1
    33  2 1  1  1  1
    39  2 1  1  1  1
    47  4 2 -1  1  1
    49  5 2 -1  1  1
    42  4 2 -1  1  1
    41  3 2 -1  1  1
    38  2 2 -1  1  1
    43  3 2 -1  1  1
    48  4 2 -1  1  1
    45  3 2 -1  1  1
    61  7 3  0 -2  1
    65  8 3  0 -2  1
    64  7 3  0 -2  1
    56  6 3  0 -2  1
    52  5 3  0 -2  1
    58  6 3  0 -2  1
    53  5 3  0 -2  1
    54  6 3  0 -2  1
    65  7 4  0  0 -3
    74  8 4  0  0 -3
    80  9 4  0  0 -3
    73  8 4  0  0 -3
    85 10 4  0  0 -3
    82 10 4  0  0 -3
    78  9 4  0  0 -3
    89 11 4  0  0 -3
    end
    
    anova y a x, cont(x)
    
                         Number of obs =      32     R-squared     =  0.9701
                         Root MSE      = .510876     Adj R-squared =  0.9656
    
                Source |  Partial SS    df       MS           F     Prob > F
            -----------+----------------------------------------------------
                 Model |  228.453154     4  57.1132885     218.83     0.0000
                       |
                     a |  1.79283521     3  .597611737       2.29     0.1010
                     x |  33.9531542     1  33.9531542     130.09     0.0000
                       |
              Residual |  7.04684582    27   .26099429   
            -----------+----------------------------------------------------
                 Total |      235.50    31  7.59677419
    
    adjust x, by(a) gen(adjy)
    
    -------------------------------------------------------------------------------
    Dependent variable: y     Command: anova
    Created variable: adjy
    Covariate set to mean: x = 55
    -------------------------------------------------------------------------------
    
    ----------+-----------
            a |         xb
    ----------+-----------
            1 |    5.31013
            2 |    5.32566
            3 |    5.76735
            4 |    5.09686
    ----------+-----------
    Key:  xb         =  Linear Prediction
    
    
    
    /* fhcomp & tukeyhsd requires an extra step */
    quietly anova adjy a
    
    fhcomp a, nu(27) mse(.26099429)  /* mse is from the original ancova */
    
    Fisher-Hayter pairwise comparisons for variable a
    studentized range critical value(.05, 3, 27) = 3.5065705
    
                                          mean     critical
    grp vs grp       group means          dif        dif
    -------------------------------------------------------
      1 vs   2     5.3101     5.3257     0.0155    0.6334
      1 vs   3     5.3101     5.7674     0.4572    0.6334
      1 vs   4     5.3101     5.0969     0.2133    0.6334
      2 vs   3     5.3257     5.7674     0.4417    0.6334
      2 vs   4     5.3257     5.0969     0.2288    0.6334
      3 vs   4     5.7674     5.0969     0.6705*   0.6334
      
    tukeyhsd a, nu(27) mse(.26099429)  /* mse is from the original ancova */
    
    
    Tukey HSD pairwise comparisons for variable a
    studentized range critical value(.05, 4, 27) = 3.8701974
    uses harmonica mean sample size =    8.000
    
                                          mean     critical
    grp vs grp       group means          dif        dif
    -------------------------------------------------------
      1 vs   2     5.3101     5.3257     0.0155    0.6990
      1 vs   3     5.3101     5.7674     0.4572    0.6990
      1 vs   4     5.3101     5.0969     0.2133    0.6990
      2 vs   3     5.3257     5.7674     0.4417    0.6990
      2 vs   4     5.3257     5.0969     0.2288    0.6990
      3 vs   4     5.7674     5.0969     0.6705    0.6990
    
    /* test for homogeneity of regression slopes */
    anova y a x a*x, cont(x)
    
                         Number of obs =      32     R-squared     =  0.9719
                         Root MSE      = .525009     Adj R-squared =  0.9637
    
                Source |  Partial SS    df       MS           F     Prob > F
            -----------+----------------------------------------------------
                 Model |  228.884782     7  32.6978259     118.63     0.0000
                       |
                     a |  .355072259     3   .11835742       0.43     0.7338
                     x |  25.8488494     1  25.8488494      93.78     0.0000
                   a*x |  .431627333     3  .143875778       0.52     0.6713
                       |
              Residual |  6.61521849    24  .275634104   
            -----------+----------------------------------------------------
                 Total |      235.50    31  7.59677419

    Stata Example Continued

    regress y x x1 x2 x3
    
      Source |       SS       df       MS                  Number of obs =      32
    ---------+------------------------------               F(  4,    27) =  218.83
       Model |  228.453154     4  57.1132885               Prob > F      =  0.0000
    Residual |  7.04684582    27   .26099429               R-squared     =  0.9701
    ---------+------------------------------               Adj R-squared =  0.9656
       Total |      235.50    31  7.59677419               Root MSE      =  .51088
       
    [remainder of output omitted]
    
    regress y x 
    
      Source |       SS       df       MS                  Number of obs =      32
    ---------+------------------------------               F(  1,    30) =  769.24
       Model |  226.660319     1  226.660319               Prob > F      =  0.0000
    Residual |  8.83968103    30  .294656034               R-squared     =  0.9625
    ---------+------------------------------               Adj R-squared =  0.9612
       Total |      235.50    31  7.59677419               Root MSE      =  .54282
    
    [remainder of output omitted]
    
    regress y x1 x2 x3
    
      Source |       SS       df       MS                  Number of obs =      32
    ---------+------------------------------               F(  3,    28) =   44.28
       Model |      194.50     3  64.8333333               Prob > F      =  0.0000
    Residual |       41.00    28  1.46428571               R-squared     =  0.8259
    ---------+------------------------------               Adj R-squared =  0.8072
       Total |      235.50    31  7.59677419               Root MSE      =  1.2101
    
    [remainder of output omitted]
    
    Regression Results Summarized
    
    Model: M0     R-square       0.9701
    Model: M1     R-square       0.9625
    Model: M2     R-square       0.8259
    

    F-ratios Using Regression

    with 1 and 27 degrees of freedom

    with 3 and 27 degrees of freedom

    ANCOVA Using Regression Residuals

  • This time we will use the regression weights for the covariate to compute an adjusted score on the dependent variable. Here are the regressions results for model m1:

    regress y x
    
      Source |       SS       df       MS                  Number of obs =      32
    ---------+------------------------------               F(  1,    30) =  769.24
       Model |  226.660319     1  226.660319               Prob > F      =  0.0000
    Residual |  8.83968103    30  .294656034               R-squared     =  0.9625
    ---------+------------------------------               Adj R-squared =  0.9612
       Total |      235.50    31  7.59677419               Root MSE      =  .54282
    
    ------------------------------------------------------------------------------
           y |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
    ---------+--------------------------------------------------------------------
           x |   .1642466    .005922     27.735   0.000       .1521523    .1763409
       _cons |  -3.658563   .3395497    -10.775   0.000      -4.352016    -2.96511
    ------------------------------------------------------------------------------
    
    predict resy, resid
    
    anova resy a
    
                         Number of obs =      32     R-squared     =  0.2010
                         Root MSE      = .502235     Adj R-squared =  0.1154
    
                Source |  Partial SS    df       MS           F     Prob > F
            -----------+----------------------------------------------------
                 Model |  1.77695557     3  .592318522       2.35     0.0940
                       |
                     a |  1.77695557     3  .592318522       2.35     0.0940
                       |
              Residual |  7.06272558    28  .252240199   
            -----------+----------------------------------------------------
                 Total |  8.83968115    31  .285151005 
    

    Manual Adjustment for ANCOVA Using Residuals

  • Recompute MSerror using the ancova degrees of freedom for ANCOVA, in this case, 27.
  • Recompute the F-ratio using the new MSerror.
  • P value Even with the adjustment of the degrees of freedom the F-ratio using regression (F = 2.27) is different from the F-ratio from the ancova (F = 2.29). Why is this? The analysis using regression residuals does not take into account the association bewteen the covariate and the categorical predictor. Actually, the F-ratios in this example are much closer together than usually occurs.

    Example with Two Covariates

    input id  y  c1  c2 grp
     1  6   1   6   1      
     2  9   1   7   1  
     3  8   2  15   1   
     4  8   3  13   1   
     5 12   3  18   1   
     6 12   4   9   1   
     7 10   4  16   1   
     8  8   5  10   1  
     9 12   5  16   1   
    10 13   6  18   1     
    11 13   4  12   2   
    12 16   4  12   2  
    13 15   5  17   2   
    14 16   6   9   2   
    15 19   6  20   2   
    16 17   8  18   2   
    17 19   8  16   2  
    18 23   9  20   2  
    19 19  10  10   2   
    20 22  10  17   2      
    21 20   7   8   3   
    22 22   7  14   3   
    23 24   9  11   3   
    24 26   9  11   3   
    25 24  10  16   3   
    26 25  11  20   3  
    27 28  11  19   3   
    28 27  12  19   3  
    29 29  13  12   3  
    30 26  13  16   3   
    31 27   7  16   4 
    32 28   8  10   4 
    33 25   8  13   4 
    34 27   9   7   4  
    35 31   9  15   4  
    36 29  10  20   4  
    37 32  10  16   4  
    38 30  12  21   4  
    39 32  12  15   4  
    40 33  14  21   4 
    end
    
    tabstat y c1 c2, by(grp) stat(n mean sd) col(stat)  
    
    Summary for variables: y c1 c2
         by categories of: grp 
    
         grp |         N      mean        sd
    ---------+------------------------------
           1 |        10       9.8  2.347576
             |        10       3.4  1.712698
             |        10      12.8  4.491968
    ---------+------------------------------
           2 |        10      17.9  3.107339
             |        10         7  2.309401
             |        10      15.1  4.040077
    ---------+------------------------------
           3 |        10      25.1  2.726414
             |        10      10.2   2.20101
             |        10      14.6  4.060651
    ---------+------------------------------
           4 |        10      29.4  2.633122
             |        10       9.9   2.18327
             |        10      15.4  4.599517
    ---------+------------------------------
       Total |        40     20.55  7.977372
             |        40     7.625  3.439495
             |        40    14.475  4.260658
    ----------------------------------------
    
    anova y grp  /* 0 covariates */
    
                               Number of obs =      40     R-squared     =  0.8929
                               Root MSE      = 2.71723     Adj R-squared =  0.8840
    
                      Source |  Partial SS    df       MS           F     Prob > F
                  -----------+----------------------------------------------------
                       Model |      2216.1     3       738.7     100.05     0.0000
                             |
                         grp |      2216.1     3       738.7     100.05     0.0000
                             |
                    Residual |       265.8    36  7.38333333   
                  -----------+----------------------------------------------------
                       Total |      2481.9    39  63.6384615   
    
    anova y c1 grp, cont(c1)  /* 1 covariate */
    
                               Number of obs =      40     R-squared     =  0.9594
                               Root MSE      = 1.69598     Adj R-squared =  0.9548
    
                      Source |  Partial SS    df       MS           F     Prob > F
                  -----------+----------------------------------------------------
                       Model |  2381.22741     4  595.306852     206.97     0.0000
                             |
                          c1 |  165.127408     1  165.127408      57.41     0.0000
                         grp |  415.841199     3  138.613733      48.19     0.0000
                             |
                    Residual |  100.672592    35  2.87635976   
                  -----------+----------------------------------------------------
                       Total |      2481.9    39  63.6384615
    
    anova y c1 c2 grp, cont(c1 c2)  /* 2 covariates */
    
                               Number of obs =      40     R-squared     =  0.9624
                               Root MSE      = 1.65656     Adj R-squared =  0.9569
    
                      Source |  Partial SS    df       MS           F     Prob > F
                  -----------+----------------------------------------------------
                       Model |  2388.59757     5  477.719513     174.08     0.0000
                             |
                          c1 |   98.974038     1   98.974038      36.07     0.0000
                          c2 |  7.37015734     1  7.37015734       2.69     0.1105
                         grp |  420.189396     3  140.063132      51.04     0.0000
                             |
                    Residual |  93.3024343    34  2.74418925   
                  -----------+----------------------------------------------------
                       Total |      2481.9    39  63.6384615
    
    adjust c1 c2, by(grp) gen(adjy)
    
    ---------------------------------------------------------------------------------------------------------------
         Dependent variable: y     Command: anova
     Covariates set to mean: c1 = 7.625, c2 = 14.475
    ---------------------------------------------------------------------------------------------------------------
    
    ----------------------
          grp |         xb
    ----------+-----------
            1 |    13.7834
            2 |    18.3846
            3 |    22.7797
            4 |    27.2523
    ----------------------
         Key:  xb  =  Linear Prediction
    
    quietly anova adjy grp
    
    fhcomp grp, nu(34) mse(2.74418925)
    
    Fisher-Hayter pairwise comparisons for variable grp
    studentized range critical value(.05, 3, 34) = 3.4655934
    
                                          mean     critical
    grp vs grp       group means          dif        dif
    -------------------------------------------------------
      1 vs   2    13.7834    18.3846     4.6012*   1.8155
      1 vs   3    13.7834    22.7797     8.9964*   1.8155
      1 vs   4    13.7834    27.2523    13.4690*   1.8155
      2 vs   3    18.3846    22.7797     4.3952*   1.8155
      2 vs   4    18.3846    27.2523     8.8678*   1.8155
      3 vs   4    22.7797    27.2523     4.4726*   1.8155
    
    anova y c1 grp c1*grp, cont(c1)  /* check homogeneity of regression for c2 */
    
                               Number of obs =      40     R-squared     =  0.9598
                               Root MSE      = 1.76482     Adj R-squared =  0.9511
    
                      Source |  Partial SS    df       MS           F     Prob > F
                  -----------+----------------------------------------------------
                       Model |  2382.23359     7  340.319084     109.27     0.0000
                             |
                          c1 |  152.279387     1  152.279387      48.89     0.0000
                         grp |  70.1635717     3  23.3878572       7.51     0.0006
                      c1*grp |  1.00618243     3  .335394144       0.11     0.9550
                             |
                    Residual |  99.6664092    32  3.11457529   
                  -----------+----------------------------------------------------
                       Total |      2481.9    39  63.6384615   
    
    anova y c2 grp c2*grp, cont(c2)  /* check homogeneity of regression for c2 */
    
                               Number of obs =      40     R-squared     =  0.9228
                               Root MSE      = 2.44624     Adj R-squared =  0.9060
    
                      Source |  Partial SS    df       MS           F     Prob > F
                  -----------+----------------------------------------------------
                       Model |  2290.40886     7  327.201265      54.68     0.0000
                             |
                          c2 |  73.6130056     1  73.6130056      12.30     0.0014
                         grp |  182.057287     3  60.6857623      10.14     0.0001
                      c2*grp |  .785330753     3  .261776918       0.04     0.9876
                             |
                    Residual |  191.491142    32  5.98409817   
                  -----------+----------------------------------


    Linear Statistical Models Course

    Phil Ender, 13may06, 11apr06, 25May00