
Classical Regression vs Logistic Regression
Different Assumptions
Logistic Regression Assumptions
Logit
Note: I would like to thank John Napier (1550-1617), lord of Merchiston (near Edinburgh), for developing the idea of logarithms.
About Logistic Regression
Intrepreting Logistic Coefficients
Intrepreting Odds Ratios
Example Dataset
input apt gender admit
8 1 1
7 1 0
5 1 1
3 1 0
3 1 0
5 1 1
7 1 1
8 1 1
5 1 1
5 1 1
4 0 0
7 0 1
3 0 1
2 0 0
4 0 0
2 0 0
3 0 0
4 0 1
3 0 0
2 0 0
end
Example 1: Categorical Independent Variable
logit admit gender
Iteration 0: log likelihood = -13.862944
Iteration 1: log likelihood = -12.222013
Iteration 2: log likelihood = -12.217286
Iteration 3: log likelihood = -12.217286
Logit estimates Number of obs = 20
LR chi2(1) = 3.29
Prob > chi2 = 0.0696
Log likelihood = -12.217286 Pseudo R2 = 0.1187
------------------------------------------------------------------------------
admit | Coef. Std. Err. z P>|z| [95% Conf. Interval]
---------+--------------------------------------------------------------------
gender | 1.694596 .9759001 1.736 0.082 -.2181333 3.607325
_cons | -.8472979 .6900656 -1.228 0.220 -2.199801 .5052058
------------------------------------------------------------------------------
logit admit gender, or
Logit estimates Number of obs = 20
LR chi2(1) = 3.29
Prob > chi2 = 0.0696
Log likelihood = -12.217286 Pseudo R2 = 0.1187
------------------------------------------------------------------------------
admit | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
---------+--------------------------------------------------------------------
gender | 5.444444 5.313234 1.736 0.082 .8040183 36.86729
------------------------------------------------------------------------------
Example 2: Continuous Independent Variable
logit admit apt
Iteration 0: log likelihood = -13.862944
Iteration 1: log likelihood = -9.6278718
Iteration 2: log likelihood = -9.3197603
Iteration 3: log likelihood = -9.3029734
Iteration 4: log likelihood = -9.3028914
Logit estimates Number of obs = 20
LR chi2(1) = 9.12
Prob > chi2 = 0.0025
Log likelihood = -9.3028914 Pseudo R2 = 0.3289
------------------------------------------------------------------------------
admit | Coef. Std. Err. z P>|z| [95% Conf. Interval]
---------+--------------------------------------------------------------------
apt | .9455112 .422872 2.236 0.025 .1166974 1.774325
_cons | -4.095248 1.83403 -2.233 0.026 -7.689881 -.5006154
------------------------------------------------------------------------------
logit admit apt, or
Logit estimates Number of obs = 20
LR chi2(1) = 9.12
Prob > chi2 = 0.0025
Log likelihood = -9.3028914 Pseudo R2 = 0.3289
------------------------------------------------------------------------------
admit | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
---------+--------------------------------------------------------------------
apt | 2.574129 1.088527 2.236 0.025 1.123779 5.8963
------------------------------------------------------------------------------
Example 3: Categorical & Continuous Independent Variables
logit admit gender apt
Iteration 0: log likelihood = -13.862944
Iteration 1: log likelihood = -9.5949661
Iteration 2: log likelihood = -9.2975666
Iteration 3: log likelihood = -9.2821744
Iteration 4: log likelihood = -9.2820991
Logit estimates Number of obs = 20
LR chi2(2) = 9.16
Prob > chi2 = 0.0102
Log likelihood = -9.2820991 Pseudo R2 = 0.3304
------------------------------------------------------------------------------
admit | Coef. Std. Err. z P>|z| [95% Conf. Interval]
---------+--------------------------------------------------------------------
gender | .2671938 1.300899 0.205 0.837 -2.282521 2.816909
apt | .8982803 .4713791 1.906 0.057 -.0256057 1.822166
_cons | -4.028765 1.838354 -2.192 0.028 -7.631871 -.4256579
------------------------------------------------------------------------------
logit admit gender apt, or
Logit estimates Number of obs = 20
LR chi2(2) = 9.16
Prob > chi2 = 0.0102
Log likelihood = -9.2820991 Pseudo R2 = 0.3304
------------------------------------------------------------------------------
admit | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
---------+--------------------------------------------------------------------
gender | 1.306294 1.699356 0.205 0.837 .1020267 16.72507
apt | 2.455377 1.157413 1.906 0.057 .9747193 6.185244
------------------------------------------------------------------------------
Example 4: Honors Composition using HSB Dataset
use http://www.gseis.ucla.edu/courses/data/hsb2
/* create dichotomous response variable */
generate honcomp = (write>=60)
/* create dummy coding for ses */
generate seslow = (ses==1)
generate sesmid = (ses==2)
tabulate honcomp
honcomp | Freq. Percent Cum.
------------+-----------------------------------
0 | 147 73.50 73.50
1 | 53 26.50 100.00
------------+-----------------------------------
Total | 200 100.00
logit honcomp female seslow sesmid read math
Iteration 0: log likelihood = -115.64441
Iteration 1: log likelihood = -76.56971
Iteration 2: log likelihood = -72.309247
Iteration 3: log likelihood = -71.997576
Iteration 4: log likelihood = -71.994757
Iteration 5: log likelihood = -71.994756
Logit estimates Number of obs = 200
LR chi2(5) = 87.30
Prob > chi2 = 0.0000
Log likelihood = -71.994756 Pseudo R2 = 0.3774
------------------------------------------------------------------------------
honcomp | Coef. Std. Err. z P>|z| [95% Conf. Interval]
---------+--------------------------------------------------------------------
female | 1.145726 .4513589 2.538 0.011 .2610792 2.030374
seslow | -.0541296 .5945439 -0.091 0.927 -1.219414 1.111155
sesmid | -1.094532 .4833959 -2.264 0.024 -2.04197 -.1470932
read | .0687277 .0287044 2.394 0.017 .0124681 .1249873
math | .1358904 .0336874 4.034 0.000 .0698642 .2019166
_cons | -13.64492 2.120165 -6.436 0.000 -17.80036 -9.489469
------------------------------------------------------------------------------
test seslow sesmid
( 1) seslow = 0.0
( 2) sesmid = 0.0
chi2( 2) = 6.13
Prob > chi2 = 0.0466
logit, or
Logit estimates Number of obs = 200
LR chi2(5) = 87.30
Prob > chi2 = 0.0000
Log likelihood = -71.994756 Pseudo R2 = 0.3774
------------------------------------------------------------------------------
honcomp | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
---------+--------------------------------------------------------------------
female | 3.144725 1.4194 2.538 0.011 1.29833 7.616932
seslow | .9473093 .563217 -0.091 0.927 .2954031 3.037865
sesmid | .3346963 .1617908 -2.264 0.024 .1297728 .8632135
read | 1.071145 .0307466 2.394 0.017 1.012546 1.133134
math | 1.145556 .0385909 4.034 0.000 1.072363 1.223746
------------------------------------------------------------------------------
listcoef /* available for J. Scott Long via the Internet */
logit (N=200): Factor Change in Odds
Odds of: 1 vs 0
----------------------------------------------------------------------
honcomp | b z P>|z| e^b e^bStdX SDofX
-------------+--------------------------------------------------------
female | 1.14573 2.538 0.011 3.1447 1.7718 0.4992
seslow | -0.05413 -0.091 0.927 0.9473 0.9773 0.4251
sesmid | -1.09453 -2.264 0.024 0.3347 0.5781 0.5006
read | 0.06873 2.394 0.017 1.0711 2.0232 10.2529
math | 0.13589 4.034 0.000 1.1456 3.5718 9.3684
----------------------------------------------------------------------
fitstat /* available for J. Scott Long via the Internet */
Measures of Fit for logit of honcomp
Log-Lik Intercept Only: -115.644 Log-Lik Full Model: -71.995
D(194): 143.990 LR(5): 87.299
Prob > LR: 0.000
McFadden's R2: 0.377 McFadden's Adj R2: 0.326
Maximum Likelihood R2: 0.354 Cragg & Uhler's R2: 0.516
McKelvey and Zavoina's R2: 0.549 Efron's R2: 0.404
Variance of y*: 7.296 Variance of error: 3.290
Count R2: 0.830 Adj Count R2: 0.358
AIC: 0.780 AIC*n: 155.990
BIC: -883.884 BIC': -60.808
lfit
Logistic model for honcomp, goodness-of-fit test
number of observations = 200
number of covariate patterns = 189
Pearson chi2(183) = 166.48
Prob > chi2 = 0.8040
lfit, group(10)
Logistic model for honcomp, goodness-of-fit test
(Table collapsed on quantiles of estimated probabilities)
number of observations = 200
number of groups = 10
Hosmer-Lemeshow chi2(8) = 12.91
Prob > chi2 = 0.1151
lstat
Logistic model for honcomp
-------- True --------
Classified | D ~D Total
-----------+--------------------------+-----------
+ | 31 12 | 43
- | 22 135 | 157
-----------+--------------------------+-----------
Total | 53 147 | 200
Classified + if predicted Pr(D) >= .5
True D defined as honcomp ~= 0
--------------------------------------------------
Sensitivity Pr( +| D) 58.49%
Specificity Pr( -|~D) 91.84%
Positive predictive value Pr( D| +) 72.09%
Negative predictive value Pr(~D| -) 85.99%
--------------------------------------------------
False + rate for true ~D Pr( +|~D) 8.16%
False - rate for true D Pr( -| D) 41.51%
False + rate for classified + Pr(~D| +) 27.91%
False - rate for classified - Pr( D| -) 14.01%
--------------------------------------------------
Correctly classified 83.00%
--------------------------------------------------
Phil Ender, 20dec00