Ed231A
Multivariate Analysis

Introduction


Introduction to Education 231A

Multivariate Analysis

Instructor: Phil Ender

  • Email: ender@ucla.edu
  • Moore Hall 2005C
  • (310) 825-1944

    Textbook:

  • Computer-Aided Multivariate Analysis (4th Edition)
    by Afifi, Clark and May
    Publisher: Chapman & Hall/CRC
    Year: 2004
    ISBN 1-58488-308-1

    You can view textbook examples for this book using several different statistical software packages at the ATS website: Afifi, Clark & May -- Textbook Examples.

    Course Organization

  • No exams
  • 10 Computer Assignments
  • Programming using either Stata or SAS IML

    Electronic Support

    Ed231A Webpage

  • http://www.gseis.ucla.edu/courses/ed210d1/
  • Syllabus
  • Lecture Notes
  • Help Sheets
  • Computer Assignments

    Ed231A Class Discussion Forum

  • Ask questions electronically
  • Receive replies from instructor
  • View Discussion Forum

    Lecture Notes

  • Lectures will be used in class.
  • Lectures will be available on the Ed231A Web site.

    About Assignments

  • Write your own programs
  • Make programs general
  • Include comments & labels

    Computers Running Stata

  • 16 Macs in Moore Hall*
  • 20 Macs in GSE&IS Building*
  • Macs & PCs in CLICC Labs in Powell Library
  • PCs in Social Sciences Computing Lab**

    *May Require Technology Fee
    **Social Science students only

    Relative Course Difficulty

    Let's get started...

    What makes a model multivariate?

  •         Is multiple regression multivariate?
  •         The Afifi, Clark & May view of multivariate.

    Every model has a

    lhs variables are response variables (the so called dependent variables).
    rhs variables are predictor or explanatory variables (aka independent variables).

    Here are two univariate models.

    And two multivariate models. For the purposes of this class, multivariate will be taken to mean models with multiple lhs variables.

    The concept of right hand side and left hand side equivalence.
    There are times when rhs variables and lhs variables an be exchanged and the two models can yield the same results.

    Examples:
    /* multivariate anova -- female is a rhs variable */
    manova read write math = female
    
                               Number of obs =     200
    
                               W = Wilks' lambda      L = Lawley-Hotelling trace
                               P = Pillai's trace     R = Roy's largest root
    
                      Source |  Statistic     df   F(df1,    df2) =   F   Prob>F
                  -----------+--------------------------------------------------
                      female | W   0.8501      1     3.0   196.0    11.52 0.0000 e
                             | P   0.1499            3.0   196.0    11.52 0.0000 e
                             | L   0.1763            3.0   196.0    11.52 0.0000 e
                             | R   0.1763            3.0   196.0    11.52 0.0000 e
                             |--------------------------------------------------
                    Residual |               198
                  -----------+--------------------------------------------------
                       Total |               199
                  --------------------------------------------------------------
                               e = exact, a = approximate, u = upper bound on F
    
    /* OLS regression -- female is a lhs variable */
    /* in SAS: model female = read write math     */
    regress female read write math
    
          Source |       SS       df       MS              Number of obs =     200
    -------------+------------------------------           F(  3,   196) =   11.52
           Model |  7.43351627     3  2.47783876           Prob > F      =  0.0000
        Residual |  42.1614837   196  .215109611           R-squared     =  0.1499
    -------------+------------------------------           Adj R-squared =  0.1369
           Total |      49.595   199  .249221106           Root MSE      =   .4638
    
    ------------------------------------------------------------------------------
          female |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
            read |  -.0112975   .0045153    -2.50   0.013    -.0202023   -.0023926
           write |   .0270844   .0046522     5.82   0.000     .0179095    .0362593
            math |  -.0102947   .0050408    -2.04   0.042     -.020236   -.0003535
           _cons |   .2476519   .2099033     1.18   0.239    -.1663071     .661611
    ------------------------------------------------------------------------------
    The role of matrix algebra in multivariate analysis.

    Matrix algebra gives us a concise and elegant way in which to represent multivariate models. If you are intimidated by it, please realize that the alternatives to matrix representation are worse.

    Consider this univariate multiple regression model

    Contrast it with this multivariate multiple regression model Some Examples of Multivariate Generalization of Univariate Models

    These examples are in stat package pseudo-code

    Regression:
    model       y  = x1            /* simple linear regression */
    model       y  = x1 x2 x3      /* multiple linear regression */
    model y1 y2 y3 = x1 x2 x3      /* multivariate multiple regression */
    
    Probit Analysis (the z's are binary, 0/1, variables):
    model       z  = x1            /* simple probit analysis */
    model       z  = x1 x2 x3      /* multiple probit analysis */
    model z1 z2 z3 = x1 x2 x3      /* multivariate probit analysis */
    
    Correlation:
    model           ry,x           /* Pearson correlation */
    model           Ry.x1,x2,x3    /* multiple correlation */
    model RC y1,y2,y3 = x1,x2,x3   /* cannonical correlation */
    
    Anova:
    model       y  = a             /* one-way anova */
    model       y  = a b a*b       /* two-way anova */
    model y1 y2 y3 = a             /* one-way multivariate anova (manova) */
    model y1 y2 y3 = a b a*b       /* two-way multivariate anova (manova) */
    Classifying Multivariate Models

    I. Testing effects; discriminating among groups

    II. Simplification of variable structure; determining dimensionality; rank reduction III. Other Some Multivariate Analogs to Univariate Procedures

    To be a well behaved multivariate analog the multivariate procedure with one response variable should yield equivalent results as the univariate proecedure.

    Examples:

    ttest write, by(female)
    
    Two-sample t test with equal variances
    
    ------------------------------------------------------------------------------
       Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
    ---------+--------------------------------------------------------------------
        male |      91    50.12088    1.080274    10.30516    47.97473    52.26703
      female |     109    54.99083    .7790686    8.133715    53.44658    56.53507
    ---------+--------------------------------------------------------------------
    combined |     200      52.775    .6702372    9.478586    51.45332    54.09668
    ---------+--------------------------------------------------------------------
        diff |           -4.869947    1.304191               -7.441835   -2.298059
    ------------------------------------------------------------------------------
    Degrees of freedom: 198
    
                      Ho: mean(male) - mean(female) = diff = 0
    
         Ha: diff < 0               Ha: diff != 0              Ha: diff > 0
           t =  -3.7341                t =  -3.7341              t =  -3.7341
       P < t =   0.0001          P > |t| =   0.0002          P > t =   0.9999
    
    hotel write, by(female) notable
    
    2-group Hotelling's T-squared = 13.943308
    F test statistic: ((200-1-1)/(200-2)(1)) x 13.943308 = 13.943308
    
    H0: Vectors of means are equal for the two groups
                  F(1,198) =   13.9433
           Prob > F(1,198) =    0.0002
    
    display sqrt(r(T2))
    3.7340739
    
    anova write prog
    
                               Number of obs =     200     R-squared     =  0.1776
                               Root MSE      = 8.63918     Adj R-squared =  0.1693
    
                      Source |  Partial SS    df       MS           F     Prob > F
                  -----------+----------------------------------------------------
                       Model |  3175.69786     2  1587.84893      21.27     0.0000
                             |
                        prog |  3175.69786     2  1587.84893      21.27     0.0000
                             |
                    Residual |  14703.1771   197   74.635417   
                  -----------+----------------------------------------------------
                       Total |   17878.875   199   89.843593   
    
    manova write = prog
    
                               Number of obs =     200
    
                               W = Wilks' lambda      L = Lawley-Hotelling trace
                               P = Pillai's trace     R = Roy's largest root
    
                      Source |  Statistic     df   F(df1,    df2) =   F   Prob>F
                  -----------+--------------------------------------------------
                        prog | W   0.8224      2     2.0   197.0    21.27 0.0000 e
                             | P   0.1776            2.0   197.0    21.27 0.0000 e
                             | L   0.2160            2.0   197.0    21.27 0.0000 e
                             | R   0.2160            2.0   197.0    21.27 0.0000 e
                             |--------------------------------------------------
                    Residual |               197
                  -----------+--------------------------------------------------
                       Total |               199
                  --------------------------------------------------------------
                               e = exact, a = approximate, u = upper bound on F
    
    regress write read female
    
          Source |       SS       df       MS              Number of obs =     200
    -------------+------------------------------           F(  2,   197) =   77.21
           Model |  7856.32118     2  3928.16059           Prob > F      =  0.0000
        Residual |  10022.5538   197  50.8759077           R-squared     =  0.4394
    -------------+------------------------------           Adj R-squared =  0.4337
           Total |   17878.875   199   89.843593           Root MSE      =  7.1327
    
    ------------------------------------------------------------------------------
           write |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
            read |   .5658869   .0493849    11.46   0.000      .468496    .6632778
          female |   5.486894   1.014261     5.41   0.000      3.48669    7.487098
           _cons |   20.22837   2.713756     7.45   0.000     14.87663    25.58011
    ------------------------------------------------------------------------------
    
    display sqrt(e(r2))
    .66288703
    
    mvreg write = read female
    
    Equation          Obs  Parms        RMSE    "R-sq"          F        P
    ----------------------------------------------------------------------
    write             200      3    7.132735    0.4394   77.21062   0.0000
    
    ------------------------------------------------------------------------------
                 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    write        |
            read |   .5658869   .0493849    11.46   0.000      .468496    .6632778
          female |   5.486894   1.014261     5.41   0.000      3.48669    7.487098
           _cons |   20.22837   2.713756     7.45   0.000     14.87663    25.58011
    ------------------------------------------------------------------------------
    
    
    canon (write) (read female)
    
    Linear combinations for canonical correlation 1        Number of obs =     200
    ------------------------------------------------------------------------------
                 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    u            |
           write |    .105501   .0084684    12.46   0.000     .0888016    .1222004
    -------------+----------------------------------------------------------------
    v            |
            read |    .090063   .0078598    11.46   0.000     .0745639    .1055622
          female |   .8732598   .1614235     5.41   0.000     .5549397     1.19158
    ------------------------------------------------------------------------------
                                         (Standard errors estimated conditionally)
    Canonical correlations:
      0.6629


    Ed231A Page
    UCLA Department of Education

    Phil Ender, 30sep05, 24jan05