
Consider the Following 4 Group Design:
Level a1
a2 a3 a4 Total
1
3
2
2
2
3
4
3
5
6
4
5
10
10
9
11
Mean 2.0 3.0 5.0 10.0 5.0
xi
xi is built into Stata. It does dummy coding on-the-fly for categorical variables.
xi3
xi3 is a Stata program, available from ATS via Internet (findit xi3), that can perform a number of different coding systems for categorical variables.
Dummy Coding
Dummy coded variables are also known as indicator variables.
input y grp d1 d2 d3
1 1 1 0 0
3 1 1 0 0
2 1 1 0 0
2 1 1 0 0
2 2 0 1 0
3 2 0 1 0
4 2 0 1 0
3 2 0 1 0
5 3 0 0 1
6 3 0 0 1
4 3 0 0 1
5 3 0 0 1
10 4 0 0 0
10 4 0 0 0
9 4 0 0 0
11 4 0 0 0
end
tabstat y, by(grp)
Summary for variables: y
by categories of: grp
grp | mean
---------+----------
1 | 2
2 | 3
3 | 5
4 | 10
---------+----------
Total | 5
--------------------
regress y d1 d2 d3
Source | SS df MS Number of obs = 16
---------+------------------------------ F( 3, 12) = 76.00
Model | 152.00 3 50.6666667 Prob > F = 0.0000
Residual | 8.00 12 .666666667 R-squared = 0.9500
---------+------------------------------ Adj R-squared = 0.9375
Total | 160.00 15 10.6666667 Root MSE = .8165
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
d1 | -8 .5773503 -13.856 0.000 -9.257938 -6.742062
d2 | -7 .5773503 -12.124 0.000 -8.257938 -5.742062
d3 | -5 .5773503 -8.660 0.000 -6.257938 -3.742062
_cons | 10 .4082483 24.495 0.000 9.110503 10.8895
------------------------------------------------------------------------------
char grp[omit] 4
. xi3: regress y i.grp
i.grp _Igrp_1-4 (naturally coded; _Igrp_4 omitted)
Source | SS df MS Number of obs = 16
-------------+------------------------------ F( 3, 12) = 76.00
Model | 152.00 3 50.6666667 Prob > F = 0.0000
Residual | 8.00 12 .666666667 R-squared = 0.9500
-------------+------------------------------ Adj R-squared = 0.9375
Total | 160.00 15 10.6666667 Root MSE = .8165
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_Igrp_1 | -8 .5773503 -13.86 0.000 -9.257938 -6.742062
_Igrp_2 | -7 .5773503 -12.12 0.000 -8.257938 -5.742062
_Igrp_3 | -5 .5773503 -8.66 0.000 -6.257938 -3.742062
_cons | 10 .4082483 24.49 0.000 9.110503 10.8895
------------------------------------------------------------------------------
describe _Igrp_1 _Igrp_2 _Igrp_3
storage display value
variable name type format label variable label
-------------------------------------------------------------------------------
_Igrp_1 byte %8.0g grp=1
_Igrp_2 byte %8.0g grp=2
_Igrp_3 byte %8.0g grp=3
anova y grp
Number of obs = 16 R-squared = 0.9500
Root MSE = .816497 Adj R-squared = 0.9375
Source | Partial SS df MS F Prob > F
-----------+----------------------------------------------------
Model | 152.00 3 50.6666667 76.00 0.0000
|
grp | 152.00 3 50.6666667 76.00 0.0000
|
Residual | 8.00 12 .666666667
-----------+----------------------------------------------------
Total | 160.00 15 10.6666667
anova, regress
Source | SS df MS Number of obs = 16
-------------+------------------------------ F( 3, 12) = 76.00
Model | 152.00 3 50.6666667 Prob > F = 0.0000
Residual | 8.00 12 .666666667 R-squared = 0.9500
-------------+------------------------------ Adj R-squared = 0.9375
Total | 160.00 15 10.6666667 Root MSE = .8165
------------------------------------------------------------------------------
y Coef. Std. Err. t P>|t| [95% Conf. Interval]
------------------------------------------------------------------------------
_cons 10 .4082483 24.49 0.000 9.110503 10.8895
grp
1 -8 .5773503 -13.86 0.000 -9.257938 -6.742062
2 -7 .5773503 -12.12 0.000 -8.257938 -5.742062
3 -5 .5773503 -8.66 0.000 -6.257938 -3.742062
4 (dropped)
------------------------------------------------------------------------------
Effect Coding
Effect coding is sometimes known as deviation coding.
input y grp e1 e2 e3
1 1 1 0 0
3 1 1 0 0
2 1 1 0 0
2 1 1 0 0
2 2 0 1 0
3 2 0 1 0
4 2 0 1 0
3 2 0 1 0
5 3 0 0 1
6 3 0 0 1
4 3 0 0 1
5 3 0 0 1
10 4 -1 -1 -1
10 4 -1 -1 -1
9 4 -1 -1 -1
11 4 -1 -1 -1
end
regress y e1 e2 e3
Source | SS df MS Number of obs = 16
---------+------------------------------ F( 3, 12) = 76.00
Model | 152.00 3 50.6666667 Prob > F = 0.0000
Residual | 8.00 12 .666666667 R-squared = 0.9500
---------+------------------------------ Adj R-squared = 0.9375
Total | 160.00 15 10.6666667 Root MSE = .8165
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
e1 | -3 .3535534 -8.485 0.000 -3.770327 -2.229673
e2 | -2 .3535534 -5.657 0.000 -2.770327 -1.229673
e3 | 0 .3535534 0.000 1.000 -.7703266 .7703266
_cons | 5 .2041241 24.495 0.000 4.555252 5.444748
------------------------------------------------------------------------------
xi3: regress y e.grp
e.grp _Igrp_1-4 (naturally coded; _Igrp_4 omitted)
Source | SS df MS Number of obs = 16
-------------+------------------------------ F( 3, 12) = 76.00
Model | 152.00 3 50.6666667 Prob > F = 0.0000
Residual | 8.00 12 .666666667 R-squared = 0.9500
-------------+------------------------------ Adj R-squared = 0.9375
Total | 160.00 15 10.6666667 Root MSE = .8165
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_Igrp_1 | -3 .3535534 -8.49 0.000 -3.770327 -2.229673
_Igrp_2 | -2 .3535534 -5.66 0.000 -2.770327 -1.229673
_Igrp_3 | 2.36e-16 .3535534 0.00 1.000 -.7703267 .7703267
_cons | 5 .2041241 24.49 0.000 4.555252 5.444748
------------------------------------------------------------------------------
describe _Igrp_1 _Igrp_2 _Igrp_3
storage display value
variable name type format label variable label
-------------------------------------------------------------------------------
_Igrp_1 double %10.0g grp(1 vs. grand mean)
_Igrp_2 double %10.0g grp(2 vs. grand mean)
_Igrp_3 double %10.0g grp(3 vs. grand mean)Orthogonal Coding
Example Using Orthogonal Coding
input y grp x1 x2 x3
1 1 1 1 1
3 1 1 1 1
2 1 1 1 1
2 1 1 1 1
2 2 -1 1 1
3 2 -1 1 1
4 2 -1 1 1
3 2 -1 1 1
5 3 0 -2 1
6 3 0 -2 1
4 3 0 -2 1
5 3 0 -2 1
10 4 0 0 -3
10 4 0 0 -3
9 4 0 0 -3
11 4 0 0 -3
end
table grp, contents(freq mean y sd y)
----------------------------------------------
grp | Freq. mean(y) sd(y)
----------+-----------------------------------
1 | 4 2 .8164966
2 | 4 3 .8164966
3 | 4 5 .8164966
4 | 4 10 .8164966
----------------------------------------------
corr x1 x2 x3
(obs=16)
| x1 x2 x3
-------------+---------------------------
x1 | 1.0000
x2 | 0.0000 1.0000
x3 | 0.0000 0.0000 1.0000
Anova
anova y grp
Number of obs = 16 R-squared = 0.9500
Root MSE = .816497 Adj R-squared = 0.9375
Source | Partial SS df MS F Prob > F
-----------+----------------------------------------------------
Model | 152.00 3 50.6666667 76.00 0.0000
|
grp | 152.00 3 50.6666667 76.00 0.0000
|
Residual | 8.00 12 .666666667
-----------+----------------------------------------------------
Total | 160.00 15 10.6666667
Regression Analysis Using Orthogonal Coding
regress y x1 x2 x3
Source | SS df MS Number of obs = 16
---------+------------------------------ F( 3, 12) = 76.00
Model | 152.00 3 50.6666667 Prob > F = 0.0000
Residual | 8.00 12 .666666667 R-squared = 0.9500
---------+------------------------------ Adj R-squared = 0.9375
Total | 160.00 15 10.6666667 Root MSE = .8165
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
x1 | -.5 .2886751 -1.732 0.109 -1.128969 .1289691
x2 | -.8333333 .1666667 -5.000 0.000 -1.196469 -.4701979
x3 | -1.666667 .1178511 -14.142 0.000 -1.923442 -1.409891
_cons | 5 .2041241 24.495 0.000 4.555252 5.444748
------------------------------------------------------------------------------
Orthogonal Coding Schema
Grp X1 X2 X3 X4 X5 X6 X7 X8 X9 1 1 1 1 1 1 1 1 1 1 2 -1 1 1 1 1 1 1 1 1 3 0 -2 1 1 1 1 1 1 1 4 0 0 -3 1 1 1 1 1 1 5 0 0 0 -4 1 1 1 1 1 6 0 0 0 0 -5 1 1 1 1 7 0 0 0 0 0 -6 1 1 1 8 0 0 0 0 0 0 -7 1 1 9 0 0 0 0 0 0 0 -8 1 10 0 0 0 0 0 0 0 0 -9
Orthogonal Coding Using xi3
We will use the reverse Helmert coding option in our example. Reverse Helmert coding comes closest to the manual orthogonal coding shown above.
input y grp
1 1
3 1
2 1
2 1
2 2
3 2
4 2
3 2
5 3
6 3
4 3
5 3
10 4
10 4
9 4
11 4
end
xi3 r.grp
r.grp _Igrp_1-4 (naturally coded; _Igrp_1 omitted)
list
y grp _Igrp_2 _Igrp_3 _Igrp_4
1. 1 1 -.5 -.3333333 -.25
2. 3 1 -.5 -.3333333 -.25
3. 2 1 -.5 -.3333333 -.25
4. 2 1 -.5 -.3333333 -.25
5. 2 2 .5 -.3333333 -.25
6. 3 2 .5 -.3333333 -.25
7. 4 2 .5 -.3333333 -.25
8. 3 2 .5 -.3333333 -.25
9. 5 3 0 .6666667 -.25
10. 6 3 0 .6666667 -.25
11. 4 3 0 .6666667 -.25
12. 5 3 0 .6666667 -.25
13. 10 4 0 0 .75
14. 10 4 0 0 .75
15. 9 4 0 0 .75
16. 11 4 0 0 .75
corr _Igrp_2 _Igrp_3 _Igrp_4
(obs=16)
| _Igrp_2 _Igrp_3 _Igrp_4
-------------+---------------------------
_Igrp_2 | 1.0000
_Igrp_3 | 0.0000 1.0000
_Igrp_4 | 0.0000 0.0000 1.0000
regress y _Igrp_2 _Igrp_3 _Igrp_4
Source | SS df MS Number of obs = 16
-------------+------------------------------ F( 3, 12) = 76.00
Model | 152.00 3 50.6666667 Prob > F = 0.0000
Residual | 8.00 12 .666666667 R-squared = 0.9500
-------------+------------------------------ Adj R-squared = 0.9375
Total | 160.00 15 10.6666667 Root MSE = .8165
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_Igrp_2 | 1 .5773503 1.73 0.109 -.2579382 2.257938
_Igrp_3 | 2.5 .5 5.00 0.000 1.410594 3.589406
_Igrp_4 | 6.666667 .4714045 14.14 0.000 5.639564 7.693769
_cons | 5 .2041241 24.49 0.000 4.555252 5.444748
------------------------------------------------------------------------------
describe _Igrp_2 _Igrp_3 _Igrp_4
storage display value
variable name type format label variable label
-------------------------------------------------------------------------------
_Igrp_2 double %10.0g grp(2 vs. 1)
_Igrp_3 double %10.0g grp(3 vs. 2-)
_Igrp_4 double %10.0g grp(4 vs. 3-)Simple Coding
Comparing each group with a reference group
Compare the results of this coding scheme with that of dummy coding.
input y grp
1 1
3 1
2 1
2 1
2 2
3 2
4 2
3 2
5 3
6 3
4 3
5 3
10 4
10 4
9 4
11 4
end
xi3 g.grp
g.grp _Igrp_1-4 (naturally coded; _Igrp_4 omitted)
describe _Igrp_1 _Igrp_2 _Igrp_3
storage display value
variable name type format label variable label
-------------------------------------------------------------------------------
_Igrp_1 double %10.0g grp(1 vs. 4)
_Igrp_2 double %10.0g grp(2 vs. 4)
_Igrp_3 double %10.0g grp(3 vs. 4)
list _Igrp_1 _Igrp_2 _Igrp_3
_Igrp_1 _Igrp_2 _Igrp_3
1. .75 -.25 -.25
2. .75 -.25 -.25
3. .75 -.25 -.25
4. .75 -.25 -.25
5. -.25 .75 -.25
6. -.25 .75 -.25
7. -.25 .75 -.25
8. -.25 .75 -.25
9. -.25 -.25 .75
10. -.25 -.25 .75
11. -.25 -.25 .75
12. -.25 -.25 .75
13. -.25 -.25 -.25
14. -.25 -.25 -.25
15. -.25 -.25 -.25
16. -.25 -.25 -.25
xi3: regress y g.grp
g.grp _Igrp_1-4 (naturally coded; _Igrp_4 omitted)
Source | SS df MS Number of obs = 16
-------------+------------------------------ F( 3, 12) = 76.00
Model | 152.00 3 50.6666667 Prob > F = 0.0000
Residual | 8.00 12 .666666667 R-squared = 0.9500
-------------+------------------------------ Adj R-squared = 0.9375
Total | 160.00 15 10.6666667 Root MSE = .8165
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_Igrp_1 | -8 .5773503 -13.86 0.000 -9.257938 -6.742062
_Igrp_2 | -7 .5773503 -12.12 0.000 -8.257938 -5.742062
_Igrp_3 | -5 .5773503 -8.66 0.000 -6.257938 -3.742062
_cons | 5 .2041241 24.49 0.000 4.555252 5.444748
------------------------------------------------------------------------------
User Defined Coding
char grp[user] (1,-1,0,0\0,0,1,-1\.5,.5,-.5,-.5)
xi3: regress y u.grp
u.grp _Igrp_1-4 (naturally coded; _Igrp_4 omitted)
Source | SS df MS Number of obs = 16
-------------+------------------------------ F( 3, 12) = 76.00
Model | 152.00 3 50.6666667 Prob > F = 0.0000
Residual | 8.00 12 .666666667 R-squared = 0.9500
-------------+------------------------------ Adj R-squared = 0.9375
Total | 160.00 15 10.6666667 Root MSE = .8165
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_Igrp_1 | -1 .5773503 -1.73 0.109 -2.257938 .2579382
_Igrp_2 | -5 .5773503 -8.66 0.000 -6.257938 -3.742062
_Igrp_3 | -5 .4082483 -12.25 0.000 -5.889497 -4.110503
_cons | 5 .2041241 24.49 0.000 4.555252 5.444748
------------------------------------------------------------------------------
describe _Igrp_1 _Igrp_2 _Igrp_3
storage display value
variable name type format label variable label
-------------------------------------------------------------------------------
_Igrp_1 double %10.0g grp(1 -1 0 0)
_Igrp_2 double %10.0g grp(0 0 1 -1)
_Igrp_3 double %10.0g grp(.5 .5 -.5 -.5)
tablist grp _Igrp_1 _Igrp_2 _Igrp_3
grp _Igrp_1 _Igrp_2 _Igrp_3 Freq
1 .5 0 .5 4
2 -.5 0 .5 4
3 0 .5 -.5 4
4 0 -.5 -.5 4
Linear Statistical Models Course
Phil Ender, 21Feb02, 17Mar98