Ed231A
Multivariate Analysis
Discriminant Analysis


What does discriminant analysis do?

  • Tests multivariate differences between groups -- really the same as manova
  • Determines the dimensionality of the group differences
  • Provides information on the relative importance or contribution of each variable
  • Classifies known and unknown observations into groups

    In the beginning...

    Let Y be a linear combination of p dependent variables, such that

    Total SSCP

    Let T be the Total SSCP matrix, which is obtained from

    Within Groups SSCP

    The within group sums of squares and cross products matrix is expressed as:

    Between Groups SSCP

    where

    which has n1 rows of means from group 1, n2 rows of means from groups 2, nk rows of means from group k, and where

    However, it is easier to obtain B by taking T - W.

    The between groups sums of squares for the linear combination is:

    Discriminant Analysis

    We wish to select the elements of v such that is a maximum.

    This occurs when (B - λW)v = 0. Thus, discriminant analysis reduces to finding the eigenvalues and eigenvectors of W-1B which is often written E-1H.

    Computational Note

    W-1B is generally non-symmetric. Most matrix languages cannot compute the eigenvalues and vectors unless the matrix is symmetric. Therefore, you must perform some additional computations to obtain the eigenvalues and eigenvectors.

    mat l = cholesky(W)          /* Cholesky decomposition */
    mat u = l'                   /* u equals l transpose   */
    mat a = inv(l)*B*inv(u)      /* a is symmetric */
    mat symeigen uv e = a        /* e will have the eigenvalues of W-1B */
    mat v = inv(u)*uv            /* v will have the eigenvectors of W-1B */
    

    Aside

    The cholesky function performs the Cholesky decomposition of a matrix such that if L = cholesky(A) then L'l = A, where L is lower triangular matrix. The formula could also be be written LU = A where U is upper triangular. Matrix A must be symmetric and nonnegative definite.

    Number of Dimensions

  • The maximum number of discriminant dimensions is Min(p, k-1).

    Trimming

  • Trim the eigenvalues and eigenvectors
  • Retain only as many as the number of eigenvalues greater than zero.
  • Let E be the trimmed matrix of eigenvalues.
  • Let V be the trimmed matrix of eigenvectors.

    Standardized Discriminant Weights

    Standardized discriminant weights or coefficients are used when all the variables are in standard score from.

    Standardized discriminant coefficients are one of the pieces used in the interpretation of the discriminant analysis.

    wii is a diagonal matrix whose elements are the square roots of the diagonal elements of W.

    Then let

    sv is the vector of standardized discriminant coefficients.

    Structure Matrix

    The structure matrix gives the correlations between the variables and the discriminant functions.

    The structure matrix is another of the pieces used in the interpretation of the discriminant analysis.

    Computing the structure matrix.

    mat E = V'*W*V
    mat F = inv(sqrt(diag(E)))
    mat G = inv(sqet(diag(W)))
    mat A = G*W*V*F
    

    A is the structure matrix.

    Canonical Correlations

    Wilks' Lambda

    Using Canonical Correlations

    Let t = 1 - Rc2

    then,

    2-Group Case

  • Discuss the 2-group case.

    Chi-square

    with degrees of freedom:

    let q = k-1
    For Chisquare1 = pq
    For Chisquare2 = (p-1)*(q-1)
    For Chisquare3 = (p-2)*(q-2)
    

    2-Group Example

    Two-group example using the user written command discrim (author: Joseph Hilbe, Arizona State University findit discrim).

    use http://www.gseis.ucla.edu/courses/data/honors, clear
    
    describe
    
    Contains data from http://www.gseis.ucla.edu/courses/data/honors.dta
      obs:           200                          
     vars:             7                          14 Dec 2001 09:19
     size:         6,400 (99.9% of memory free)
    -------------------------------------------------------------------------------
                  storage  display     value
    variable name   type   format      label      variable label
    -------------------------------------------------------------------------------
    id              float  %9.0g                  
    female          float  %9.0g       fl         
    ses             float  %9.0g       sl         
    lang            float  %9.0g                  language test score
    math            float  %9.0g                  math score
    science         float  %9.0g                  science score
    honors          float  %9.0g                  
    
    discrim honors female lang, predict anova graph detail
    
    
                       Dichotomous Discriminant Analysis
                                                     
    Observations    = 200                            Obs Group 0 =       147
    Indep variables = 2                              Obs Group 1 =        53
                                                      
    Centroid 0  =   -0.3605                          R-square    =    0.2669
    Centroid 1  =    0.9998                          Mahalanobis =    1.8502
    Grand Cntd  =    0.6393
                                                      
    Eigenvalue   =    0.3640                         Wilk's Lambda =  0.7331
    Canon. Corr. =    0.5166                         Chi-square    = 61.1538
    Eta Squared  =    0.2669                         Sign Chi2     =  0.0000
    
    
                             Discrim Function    Unstandardized
              Variable         Coefficients        Coefficients
              -------------------------------------------------
              female             -1.0173                 0.7479
              lang               -0.1491                 0.1096
              constant            8.7741                -6.1309
    
                                                         
                            ----- Predicted -----
                Actual   |  Group 0         Group 1 |   Total    Pr(G
                ---------+--------------------------+--------
                Group 0  |      114            33   |     147      0.73
                Group 1  |       16            37   |      53      0.27
                ---------+--------------------------+--------
                Total    |      130            70   |     200
                ---------+--------------------------+--------
                                                      
                        Correctly predicted =  75.50 %
                        Model sensitivity   =  77.55 %
                        Model specificity   =  69.81 %
                        False positive      =  30.19 %
                        False negative      =  22.45 %
                        -------------------------------
                        Positive pred value =  87.69 %
                        Negative pred value =  52.86 %
                        -------------------------------
                        Kendall's tau-b     =  37.11 %
                        Cohen's kappa       =  42.96 %
    
    
                     Discriminant Scores v Group Variable
    
                            Analysis of Variance
        Source              SS         df      MS            F     Prob > F
    ------------------------------------------------------------------------
    Between groups      72.0729727      1   72.0729727     72.07     0.0000
     Within groups      197.999976    198   .999999876
    ------------------------------------------------------------------------
        Total           270.072948    199   1.35715049
    
    Bartlett's test for equal variances:  chi2(1) =   0.5374  Prob>chi2 = 0.463
    
    
    
     PRED    = Predicted Group       DIFF     = Misclassification
     LnProb1 = Probability Gr 1      DscScore = Discriminant Score
                                     DscIndex = Discriminant Index
    ---------------------------------------------------------------
    
         +------------------------------------------------------+
         | honors   PRED   DIFF   LnProb1   DscIndex   DscScore |
         |------------------------------------------------------|
      1. |      0      0           0.1252     1.9438    -1.1094 |
      2. |      0      0           0.0370     3.2593    -2.0765 |
      3. |      0      0           0.0320     3.4083    -2.1861 |
      4. |      1      0      *    0.4245     0.3043     0.0959 |
      5. |      1      1           0.8366    -1.6334     1.5205 |
         |------------------------------------------------------|
      6. |      0      0           0.1457     1.7688    -0.9807 |
      7. |      0      0           0.0731     2.5400    -1.5478 |
      8. |      0      0           0.2593     1.0495    -0.4520 |
      9. |      0      0           0.1252     1.9438    -1.1094 |
     10. |      0      0           0.0270     3.5834    -2.3148 |
         |------------------------------------------------------|
         
         ... [ouptput omitted]
         
    
    
    3-Group Example

    Three-group example using the command daoneway (findit daoneway).

    use hsb2, clear
    (highschool and beyond (200 cases))
    
    /* equivalent to one-way anova */
    
    daoneway write, by(prog)
    
                        One-way Discriminant Function Analysis
    
    Observations = 200
    Variables    = 1
    Groups       = 3
    
                     Pct of   Cum  Canonical  After  Wilks'
     Fcn Eigenvalue Variance  Pct     Corr      Fcn  Lambda  Chi-square  df  P-value
                                             |   0  0.82238    38.525     2   0.0000
       1    0.2160  100.00 100.00    0.4215  |
    
    Unstandardized canonical discriminant function coefficients
    
             func1
    write   0.1158
    _cons  -6.1088
    
    Standardized canonical discriminant function coefficients
    
            func1
    write  1.0000
    
    Canonical discriminant structure matrix
    
            func1
    write  1.0000
    
    Group means on canonical discriminant functions
    
              func1
    prog-1  -0.1669
    prog-2   0.4031
    prog-3  -0.6962
    
    /* compare with */
    
    anova write prog
    
                               Number of obs =     200     R-squared     =  0.1776
                               Root MSE      = 8.63918     Adj R-squared =  0.1693
    
                      Source |  Partial SS    df       MS           F     Prob > F
                  -----------+----------------------------------------------------
                       Model |  3175.69786     2  1587.84893      21.27     0.0000
                             |
                        prog |  3175.69786     2  1587.84893      21.27     0.0000
                             |
                    Residual |  14703.1771   197   74.635417   
                  -----------+----------------------------------------------------
                       Total |   17878.875   199   89.843593
    
    display sqrt(e(r2))
    .42145333
    
    /* equivalent to one-way manova */
    
    daoneway write read math, by(prog) gen(d)
    
                        One-way Discriminant Function Analysis
    
    Observations = 200
    Variables    = 3
    Groups       = 3
    
                     Pct of   Cum  Canonical  After  Wilks'
     Fcn Eigenvalue Variance  Pct     Corr      Fcn  Lambda  Chi-square  df  P-value
                                             |   0  0.73398    60.619     6   0.0000
       1    0.3563   98.74  98.74    0.5125  |   1  0.99548     0.888     2   0.6414
       2    0.0045    1.26 100.00    0.0672  |
    
    Unstandardized canonical discriminant function coefficients
    
             func1    func2
    write   0.0383  -0.1370
     read   0.0292   0.0439
     math   0.0703   0.0793
    _cons  -7.2509   0.7635
    
    Standardized canonical discriminant function coefficients
    
             func1    func2
    write   0.3311  -1.1834
     read   0.2729   0.4098
     math   0.5816   0.6557
    
    Canonical discriminant structure matrix
    
             func1    func2
    write   0.7753  -0.6303
     read   0.7785   0.1841
     math   0.9129   0.2725
    
    Group means on canonical discriminant functions
    
              func1    func2
    prog-1  -0.3120  -0.1190
    prog-2   0.5359   0.0197
    prog-3  -0.8445   0.0658
    
    /* classification of observations */
    
    daclass d1 d2
    
    tabulate prog _daclass
    
       type of |             _daclass
       program |         1          2          3 |     Total
    -----------+---------------------------------+----------
       general |        16         14         15 |        45 
      academic |        26         62         17 |       105 
      vocation |        17          5         28 |        50 
    -----------+---------------------------------+----------
         Total |        59         81         60 |       200
    
    /* classification under reduced dimensionality            */
    /* since only one dimension was statistically significant */
    
    rename _daclass cl1
    
    daclass d1
    
    rename _daclass cl2
    
    tabulate cl1 cl2
    
    
               |               cl2
           cl1 |         1          2          3 |     Total
    -----------+---------------------------------+----------
             1 |        44          9          6 |        59 
             2 |         5         76          0 |        81 
             3 |         4          0         56 |        60 
    -----------+---------------------------------+----------
         Total |        53         85         62 |       200 
    
    tabulate prog cl2
    
       type of |               cl2
       program |         1          2          3 |     Total
    -----------+---------------------------------+----------
       general |        14         14         17 |        45 
      academic |        23         65         17 |       105 
      vocation |        16          6         28 |        50 
    -----------+---------------------------------+----------
         Total |        53         85         62 |       200 
    
    /* compare discriminant analysis with manova */
    
    manova write read math = prog
    
                               Number of obs =     200
    
                               W = Wilks' lambda      L = Lawley-Hotelling trace
                               P = Pillai's trace     R = Roy's largest root
    
                      Source |  Statistic     df   F(df1,    df2) =   F   Prob>F
                  -----------+--------------------------------------------------
                        prog | W   0.7340      2     6.0   390.0    10.87 0.0000 e
                             | P   0.2672            6.0   392.0    10.08 0.0000 a
                             | L   0.3608            6.0   388.0    11.67 0.0000 a
                             | R   0.3563            3.0   196.0    23.28 0.0000 u
                             |--------------------------------------------------
                    Residual |               197
                  -----------+--------------------------------------------------
                       Total |               199
                  --------------------------------------------------------------
                               e = exact, a = approximate, u = upper bound on F
    Fisher's Iris Data

    This example makes use of the classic Iris data that R. A. Fisher used in developing the linear discriminant function.

    use http://www.gseis.ucla.edu/courses/data/iris, clear
    
    describe
    
    Contains data from http://www.gseis.ucla.edu/courses/data/iris.dta
      obs:           150                          
     vars:             6                          17 Feb 2000 14:30
     size:         4,200 (99.9% of memory free)
    -------------------------------------------------------------------------------
                  storage  display     value
    variable name   type   format      label      variable label
    -------------------------------------------------------------------------------
    case            float  %9.0g                  
    type            float  %10.0g      tl         type of iris
    sl              float  %9.0g                  sepal length
    sw              float  %9.0g                  sepal width
    pl              float  %9.0g                  petal length
    pw              float  %9.0g                  petal width
    -------------------------------------------------------------------------------
    Sorted by:  type
    
    tabulate type
    
        type of |
           iris |      Freq.     Percent        Cum.
    ------------+-----------------------------------
         setosa |         50       33.33       33.33
     versicolor |         50       33.33       66.67
      virginica |         50       33.33      100.00
    ------------+-----------------------------------
          Total |        150      100.00
    
    
    daoneway sl sw pl pw, by(type) gen(d)
    
                        One-way Discriminant Function Analysis
    
    Observations = 150
    Variables    = 4
    Groups       = 3
    
                     Pct of   Cum  Canonical  After  Wilks'
     Fcn Eigenvalue Variance  Pct     Corr      Fcn  Lambda  Chi-square  df  P-value
                                             |   0  0.02344   546.115     8   0.0000
       1   32.1919   99.12  99.12    0.9848  |   1  0.77797    36.530     3   0.0000
       2    0.2854    0.88 100.00    0.4712  |
    
    Unstandardized canonical discriminant function coefficients
    
             func1    func2
       sl  -0.8294   0.0241
       sw  -1.5345   2.1645
       pl   2.2012  -0.9319
       pw   2.8105   2.8392
    _cons  -2.1051  -6.6615
    
    Standardized canonical discriminant function coefficients
    
          func1    func2
    sl  -0.4270   0.0124
    sw  -0.5212   0.7353
    pl   0.9473  -0.4010
    pw   0.5752   0.5810
    
    Canonical discriminant structure matrix
    
          func1    func2
    sl   0.2226   0.3108
    sw  -0.1190   0.8637
    pl   0.7061   0.1677
    pw   0.6332   0.7372
    
    Group means on canonical discriminant functions
    
              func1    func2
    type-1  -7.6076   0.2151
    type-2   1.8250  -0.7279
    type-3   5.7826   0.5128
    
    /* classification of observations */
    
    daclass d1 d2
    
    tabulate type _daclass
    
       type of |             _daclass
          iris |         1          2          3 |     Total
    -----------+---------------------------------+----------
        setosa |        50          0          0 |        50 
    versicolor |         0         47          3 |        50 
     virginica |         0          1         49 |        50 
    -----------+---------------------------------+----------
         Total |        50         48         52 |       150 
    One More Example

    use http://www.gseis.ucla.edu/courses/data/hsb2
    
    egen grp = group(prog female)
    
    tablist grp prog female, sort(v) nol clean
    
        grp   prog   female   Freq  
          1      1        0     21  
          2      1        1     24  
          3      2        0     47  
          4      2        1     58  
          5      3        0     23  
          6      3        1     27
    
    daoneway read write math science socst, by(grp) gen(f)
    
                        One-way Discriminant Function Analysis
    
    Observations = 200
    Variables    = 5
    Groups       = 6
    
                     Pct of   Cum  Canonical  After  Wilks'
     Fcn Eigenvalue Variance  Pct     Corr      Fcn  Lambda  Chi-square  df  P-value
                                             |   0  0.49750   135.093    25   0.0000
       1    0.5023   61.32  61.32    0.5782  |   1  0.74737    56.345    16   0.0000
       2    0.2352   28.71  90.03    0.4363  |   2  0.92313    15.478     9   0.0786
       3    0.0507    6.19  96.22    0.2197  |   3  0.96993     5.907     4   0.2062
       4    0.0308    3.76  99.97    0.1728  |   4  0.99979     0.040     1   0.8412
       5    0.0002    0.03 100.00    0.0144  |
    
    Unstandardized canonical discriminant function coefficients
    
               func1    func2    func3    func4    func5
       read   0.0125   0.0519  -0.0203  -0.1421   0.0165
      write   0.0799  -0.1365   0.0258  -0.0165   0.0491
       math   0.0557   0.0642  -0.0992   0.0920   0.0473
    science  -0.0642   0.0413   0.1183   0.0404   0.0294
      socst   0.0370   0.0234   0.0279   0.0366  -0.1131
      _cons  -6.4119  -2.2525  -2.6763  -0.5666  -1.5519
    
    Standardized canonical discriminant function coefficients
    
               func1    func2    func3    func4    func5
       read   0.1161   0.4830  -0.1886  -1.3218   0.1539
      write   0.6603  -1.1280   0.2130  -0.1361   0.4061
       math   0.4636   0.5340  -0.8258   0.7659   0.3941
    science  -0.6108   0.3927   1.1251   0.3846   0.2800
      socst   0.3564   0.2251   0.2689   0.3519  -1.0880
    
    Canonical discriminant structure matrix
    
               func1    func2    func3    func4    func5
       read   0.5810   0.5236   0.2656  -0.5296   0.1930
      write   0.8096  -0.2095   0.4433  -0.0313   0.3213
       math   0.6838   0.5135  -0.0156   0.2992   0.4231
    science   0.2778   0.4342   0.7449   0.1239   0.4050
      socst   0.7035   0.2936   0.3891   0.0537  -0.5144
    
    Group means on canonical discriminant functions
    
             func1    func2    func3    func4    func5
    grp-1  -0.6640   0.4588   0.4352  -0.2093  -0.0167
    grp-2  -0.1637  -0.6220   0.1688   0.3557  -0.0124
    grp-3   0.3608   0.4947   0.0729   0.1062   0.0170
    grp-4   0.7733  -0.0895  -0.1406  -0.0935  -0.0099
    grp-5  -1.3154   0.3706  -0.4040   0.0542  -0.0050
    grp-6  -0.5068  -0.7886   0.0307  -0.1836   0.0201
    
    daclass f1 f2
    
    tab grp _daclass
    
    group(prog |                             _daclass
       female) |         1          2          3          4          5          6 |     Total
    -----------+------------------------------------------------------------------+----------
             1 |         5          4          3          3          5          1 |        21 
             2 |         1          9          0          6          3          5 |        24 
             3 |         3          9         13         16          5          1 |        47 
             4 |         1         15          8         30          2          2 |        58 
             5 |         1          3          2          0         15          2 |        23 
             6 |         2          8          1          3          2         11 |        27 
    -----------+------------------------------------------------------------------+----------
         Total |        13         48         27         58         32         22 |       200 


    Ed231A Page
    UCLA Department of Education

    Phil Ender, 28oct05, 23apr05, 29Jan98