Ed230B/C

Logistic Regression


Classical Regression vs Logistic Regression

Different Assumptions

Logistic Regression Assumptions

  1. The model is correctly specified, i.e., 1) the true conditional probabilities are a logistic function of the indpendent variables, 2) no important variables are omitted, 3) no extraneous variables are included, and 4) the independent variables are measured without error.
  2. The cases are independent.
  3. The independent variables are not linear combinations of each other. Perfect multicolinearity makes estimation impossible, while strong multicolinearity makes estimates imprecise.

Logit

Note: I would like to thank John Napier (1550-1617), lord of Merchiston (near Edinburgh), for developing the idea of logarithms.

About Logistic Regression

Intrepreting Logistic Coefficients

  • Logistic slope coefficients can be interpreted as the effect of a unit of change in the X variable on the predicted logits with the other variables in the model held constant. That is, how a one unit change in X effects the log of the odds when the other variables in the model held constant.

    Intrepreting Odds Ratios

  • Odds ratios in logistic regression can be interpreted as the effect of a one unit of change in X in the predicted odds ratio with the other variables in the model held constant.

    Example Dataset

    
    input apt gender admit
    8 1 1
    7 1 0
    5 1 1
    3 1 0
    3 1 0
    5 1 1
    7 1 1
    8 1 1
    5 1 1
    5 1 1
    4 0 0
    7 0 1
    3 0 1
    2 0 0
    4 0 0
    2 0 0
    3 0 0
    4 0 1
    3 0 0
    2 0 0
    end
    
      
    Example 1: Categorical Independent Variable
      
    logit admit gender
    
    Iteration 0:   log likelihood = -13.862944
    Iteration 1:   log likelihood = -12.222013
    Iteration 2:   log likelihood = -12.217286
    Iteration 3:   log likelihood = -12.217286
    
    Logit estimates                                   Number of obs   =         20
                                                      LR chi2(1)      =       3.29
                                                      Prob > chi2     =     0.0696
    Log likelihood = -12.217286                       Pseudo R2       =     0.1187
    
    ------------------------------------------------------------------------------
       admit |      Coef.   Std. Err.       z     P>|z|       [95% Conf. Interval]
    ---------+--------------------------------------------------------------------
      gender |   1.694596   .9759001      1.736   0.082      -.2181333    3.607325
       _cons |  -.8472979   .6900656     -1.228   0.220      -2.199801    .5052058
    ------------------------------------------------------------------------------
      
    logit admit gender, or
    
    Logit estimates                                   Number of obs   =         20
                                                      LR chi2(1)      =       3.29
                                                      Prob > chi2     =     0.0696
    Log likelihood = -12.217286                       Pseudo R2       =     0.1187
    
    ------------------------------------------------------------------------------
       admit | Odds Ratio   Std. Err.       z     P>|z|       [95% Conf. Interval]
    ---------+--------------------------------------------------------------------
      gender |   5.444444   5.313234      1.736   0.082       .8040183    36.86729
    ------------------------------------------------------------------------------
      
    Example 2: Continuous Independent Variable
      
    logit admit apt
    
    Iteration 0:   log likelihood = -13.862944
    Iteration 1:   log likelihood = -9.6278718
    Iteration 2:   log likelihood = -9.3197603
    Iteration 3:   log likelihood = -9.3029734
    Iteration 4:   log likelihood = -9.3028914
    
    Logit estimates                                   Number of obs   =         20
                                                      LR chi2(1)      =       9.12
                                                      Prob > chi2     =     0.0025
    Log likelihood = -9.3028914                       Pseudo R2       =     0.3289
    
    ------------------------------------------------------------------------------
       admit |      Coef.   Std. Err.       z     P>|z|       [95% Conf. Interval]
    ---------+--------------------------------------------------------------------
         apt |   .9455112    .422872      2.236   0.025       .1166974    1.774325
       _cons |  -4.095248    1.83403     -2.233   0.026      -7.689881   -.5006154
    ------------------------------------------------------------------------------
      
    logit admit apt, or
    
    Logit estimates                                   Number of obs   =         20
                                                      LR chi2(1)      =       9.12
                                                      Prob > chi2     =     0.0025
    Log likelihood = -9.3028914                       Pseudo R2       =     0.3289
    
    ------------------------------------------------------------------------------
       admit | Odds Ratio   Std. Err.       z     P>|z|       [95% Conf. Interval]
    ---------+--------------------------------------------------------------------
         apt |   2.574129   1.088527      2.236   0.025       1.123779      5.8963
    ------------------------------------------------------------------------------
      
    Example 3: Categorical & Continuous Independent Variables
      
    logit admit gender apt
    
    Iteration 0:   log likelihood = -13.862944
    Iteration 1:   log likelihood = -9.5949661
    Iteration 2:   log likelihood = -9.2975666
    Iteration 3:   log likelihood = -9.2821744
    Iteration 4:   log likelihood = -9.2820991
    
    Logit estimates                                   Number of obs   =         20
                                                      LR chi2(2)      =       9.16
                                                      Prob > chi2     =     0.0102
    Log likelihood = -9.2820991                       Pseudo R2       =     0.3304
    
    ------------------------------------------------------------------------------
       admit |      Coef.   Std. Err.       z     P>|z|       [95% Conf. Interval]
    ---------+--------------------------------------------------------------------
      gender |   .2671938   1.300899      0.205   0.837      -2.282521    2.816909
         apt |   .8982803   .4713791      1.906   0.057      -.0256057    1.822166
       _cons |  -4.028765   1.838354     -2.192   0.028      -7.631871   -.4256579
    ------------------------------------------------------------------------------
      
    logit admit gender apt, or
    
    Logit estimates                                   Number of obs   =         20
                                                      LR chi2(2)      =       9.16
                                                      Prob > chi2     =     0.0102
    Log likelihood = -9.2820991                       Pseudo R2       =     0.3304
    
    ------------------------------------------------------------------------------
       admit | Odds Ratio   Std. Err.       z     P>|z|       [95% Conf. Interval]
    ---------+--------------------------------------------------------------------
      gender |   1.306294   1.699356      0.205   0.837       .1020267    16.72507
         apt |   2.455377   1.157413      1.906   0.057       .9747193    6.185244
    ------------------------------------------------------------------------------
    

    Example 4: Honors Composition using HSB Dataset

    
    use http://www.gseis.ucla.edu/courses/data/hsb2
      
    /* create dichotomous response variable */
    generate honcomp = (write>=60)
      
    /* create dummy coding for ses */
    generate seslow = (ses==1)
    generate sesmid = (ses==2)
      
    tabulate honcomp
    
        honcomp |      Freq.     Percent        Cum.
    ------------+-----------------------------------
              0 |        147       73.50       73.50
              1 |         53       26.50      100.00
    ------------+-----------------------------------
          Total |        200      100.00
      
    logit honcomp female seslow sesmid read math
    
    Iteration 0:   log likelihood = -115.64441
    Iteration 1:   log likelihood =  -76.56971
    Iteration 2:   log likelihood = -72.309247
    Iteration 3:   log likelihood = -71.997576
    Iteration 4:   log likelihood = -71.994757
    Iteration 5:   log likelihood = -71.994756
    
    Logit estimates                                   Number of obs   =        200
                                                      LR chi2(5)      =      87.30
                                                      Prob > chi2     =     0.0000
    Log likelihood = -71.994756                       Pseudo R2       =     0.3774
    
    ------------------------------------------------------------------------------
     honcomp |      Coef.   Std. Err.       z     P>|z|       [95% Conf. Interval]
    ---------+--------------------------------------------------------------------
      female |   1.145726   .4513589      2.538   0.011       .2610792    2.030374
      seslow |  -.0541296   .5945439     -0.091   0.927      -1.219414    1.111155
      sesmid |  -1.094532   .4833959     -2.264   0.024       -2.04197   -.1470932
        read |   .0687277   .0287044      2.394   0.017       .0124681    .1249873
        math |   .1358904   .0336874      4.034   0.000       .0698642    .2019166
       _cons |  -13.64492   2.120165     -6.436   0.000      -17.80036   -9.489469
    ------------------------------------------------------------------------------
      
    test seslow sesmid
    
     ( 1)  seslow = 0.0
     ( 2)  sesmid = 0.0
    
               chi2(  2) =    6.13
             Prob > chi2 =    0.0466
      
    logit, or
    
    
    Logit estimates                                   Number of obs   =        200
                                                      LR chi2(5)      =      87.30
                                                      Prob > chi2     =     0.0000
    Log likelihood = -71.994756                       Pseudo R2       =     0.3774
    
    ------------------------------------------------------------------------------
     honcomp | Odds Ratio   Std. Err.       z     P>|z|       [95% Conf. Interval]
    ---------+--------------------------------------------------------------------
      female |   3.144725     1.4194      2.538   0.011        1.29833    7.616932
      seslow |   .9473093    .563217     -0.091   0.927       .2954031    3.037865
      sesmid |   .3346963   .1617908     -2.264   0.024       .1297728    .8632135
        read |   1.071145   .0307466      2.394   0.017       1.012546    1.133134
        math |   1.145556   .0385909      4.034   0.000       1.072363    1.223746
    ------------------------------------------------------------------------------
      
    listcoef  /* available for J. Scott Long via the Internet */
    
    logit (N=200): Factor Change in Odds 
    
      Odds of: 1 vs 0
    
    ----------------------------------------------------------------------
         honcomp |      b         z     P>|z|    e^b    e^bStdX      SDofX
    -------------+--------------------------------------------------------
          female |   1.14573    2.538   0.011   3.1447   1.7718     0.4992
          seslow |  -0.05413   -0.091   0.927   0.9473   0.9773     0.4251
          sesmid |  -1.09453   -2.264   0.024   0.3347   0.5781     0.5006
            read |   0.06873    2.394   0.017   1.0711   2.0232    10.2529
            math |   0.13589    4.034   0.000   1.1456   3.5718     9.3684
    ----------------------------------------------------------------------
      
    fitstat  /* available for J. Scott Long via the Internet */
    
    Measures of Fit for logit of honcomp
    
    Log-Lik Intercept Only:     -115.644     Log-Lik Full Model:          -71.995
    D(194):                      143.990     LR(5):                        87.299
                                             Prob > LR:                     0.000
    McFadden's R2:                 0.377     McFadden's Adj R2:             0.326
    Maximum Likelihood R2:         0.354     Cragg & Uhler's R2:            0.516
    McKelvey and Zavoina's R2:     0.549     Efron's R2:                    0.404
    Variance of y*:                7.296     Variance of error:             3.290
    Count R2:                      0.830     Adj Count R2:                  0.358
    AIC:                           0.780     AIC*n:                       155.990
    BIC:                        -883.884     BIC':                        -60.808
      
    lfit
    
    Logistic model for honcomp, goodness-of-fit test
    
           number of observations =       200
     number of covariate patterns =       189
                Pearson chi2(183) =       166.48
                      Prob > chi2 =         0.8040
      
    lfit, group(10)
    
    Logistic model for honcomp, goodness-of-fit test
    (Table collapsed on quantiles of estimated probabilities)
    
           number of observations =       200
                 number of groups =        10
          Hosmer-Lemeshow chi2(8) =        12.91
                      Prob > chi2 =         0.1151
      
    lstat
    
    Logistic model for honcomp
    
                  -------- True --------
    Classified |         D            ~D         Total
    -----------+--------------------------+-----------
         +     |        31            12  |         43
         -     |        22           135  |        157
    -----------+--------------------------+-----------
       Total   |        53           147  |        200
    
    Classified + if predicted Pr(D) >= .5
    True D defined as honcomp ~= 0
    --------------------------------------------------
    Sensitivity                     Pr( +| D)   58.49%
    Specificity                     Pr( -|~D)   91.84%
    Positive predictive value       Pr( D| +)   72.09%
    Negative predictive value       Pr(~D| -)   85.99%
    --------------------------------------------------
    False + rate for true ~D        Pr( +|~D)    8.16%
    False - rate for true D         Pr( -| D)   41.51%
    False + rate for classified +   Pr(~D| +)   27.91%
    False - rate for classified -   Pr( D| -)   14.01%
    --------------------------------------------------
    Correctly classified                        83.00%
    --------------------------------------------------
    


    UCLA Department of Education

    Phil Ender, 20dec00