Ed230B/C

Polynomial Regression


Polynomial regression can be used to fit a regression line to a curved set of points. Contrary to how it sounds, curvilinear regression uses a linear model to fit a curved line to data points. Curvilinear regression makes use of various transformations of variables to achieve its fit. An example of a curvilinear model is

where X2 = X12.

Curvilinear regression should not be confused with nonlinear regression (NL). Nonlinear regression fits arbitrary nonlinear functions to the dependent variable. An example of a nonlinear model is

Example 1

From Pedhazur (1997), a study looks at practice time (x) in minutes and the number of correct responses (y).

Stata Curvilinear Regression Program

use http://www.gseis.ucla.edu/courses/data/curve

scatter y x

Remarks

From Pedhazur (1997), a study looks at practice time (x) in minutes and the number of correct responses (y). Inspection of the y vs x plot reveals a degree of curvilinearity.

Based upon the scatterplot we will try three models:
model 1 -- y = bo + b1x + e -- linear
model 2 -- y = bo + b1x + b2x2 + e -- quadratic
model 3 -- y = bo + b1x + b2x2 + b3x3 + e -- cubic

generate x2 = x^2
generate x3 = x^3

regress y x

  Source |       SS       df       MS                  Number of obs =      18
---------+------------------------------               F(  1,    16) =   32.72
   Model |  380.112798     1  380.112798               Prob > F      =  0.0000
Residual |  185.887202    16  11.6179501               R-squared     =  0.6716
---------+------------------------------               Adj R-squared =  0.6511
   Total |      566.00    17  33.2941176               Root MSE      =  3.4085

------------------------------------------------------------------------------
       y |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
       x |   1.284165   .2245067      5.720   0.000       .8082319    1.760098
   _cons |    4.89154    1.73176      2.825   0.012       1.220372    8.562708
------------------------------------------------------------------------------

regress y x x2

  Source |       SS       df       MS                  Number of obs =      18
---------+------------------------------               F(  2,    15) =   31.90
   Model |  458.245766     2  229.122883               Prob > F      =  0.0000
Residual |  107.754234    15  7.18361562               R-squared     =  0.8096
---------+------------------------------               Adj R-squared =  0.7842
   Total |      566.00    17  33.2941176               Root MSE      =  2.6802

------------------------------------------------------------------------------
       y |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
       x |   4.151667   .8872181      4.679   0.000       2.260607    6.042728
      x2 |   -.209529   .0635329     -3.298   0.005      -.3449462   -.0741119
   _cons |  -2.236083    2.55445     -0.875   0.395      -7.680764    3.208598
------------------------------------------------------------------------------

regress y x x2 x3

  Source |       SS       df       MS                  Number of obs =      18
---------+------------------------------               F(  3,    14) =   20.30
   Model |  460.224174     3  153.408058               Prob > F      =  0.0000
Residual |  105.775826    14  7.55541616               R-squared     =  0.8131
---------+------------------------------               Adj R-squared =  0.7731
   Total |      566.00    17  33.2941176               Root MSE      =  2.7487

------------------------------------------------------------------------------
       y |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
       x |   2.267499   3.792818      0.598   0.559      -5.867288    10.40229
      x2 |   .0975798   .6036817      0.162   0.874      -1.197189    1.392348
      x3 |  -.0144026   .0281457     -0.512   0.617      -.0747692     .045964
   _cons |   .7460164     6.3894      0.117   0.909      -12.95788    14.44992
------------------------------------------------------------------------------

test x x2

 ( 1)  x = 0.0
 ( 2)  x2 = 0.0

       F(  2,    14) =   15.21
            Prob > F =    0.0003
            
regress y x x2

[output omitted]
            
predict p

scatter y p x, msym(o i) con(. l)

Remarks

From the above analysis, it appears that model 2 appears to be our best bet. The linear model is
y = -2.236083 + 4.151667x -0.209529x2. A plot of y vs x with the predicted scores connect by a curved line is displayed above.

Example 2

Here is another artifical example. This time we are looking at the relationship between test perfromance and anxiety.

input anxiety perform 
1  11  
1  13  
2  24  
2  20  
3  42  
3  36  
4  48  
4  42  
5  46  
5  38  
6  23  
6  19  
7   9  
7  11  
end

These data graph into an inverted-U shape. Let's run a second degree polynomial regression.

scatter perform anxiety

generate a2 = anxiety^2

regress perform anxiety a2

      Source |       SS       df       MS              Number of obs =      14
-------------+------------------------------           F(  2,    11) =   44.51
       Model |  2334.38095     2  1167.19048           Prob > F      =  0.0000
    Residual |   288.47619    11  26.2251082           R-squared     =  0.8900
-------------+------------------------------           Adj R-squared =  0.8700
       Total |  2622.85714    13  201.758242           Root MSE      =   5.121

------------------------------------------------------------------------------
     perform |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     anxiety |   29.63095    3.23401     9.16   0.000     22.51294    36.74896
          a2 |   -3.72619   .3950972    -9.43   0.000    -4.595794   -2.856587
       _cons |  -16.71429   5.643117    -2.96   0.013     -29.1347   -4.293868
------------------------------------------------------------------------------

predict p

scatter perform p anxiety, msym(o i) con(. l)


In social psychology, this inverted-U curve is called the Yerkes-Dodson curve.

Example 3

Let's try this using the hsb2 dataset.


UCLA Department of Education

Phil Ender, 21Jun99