Ed230B/C

Weighted Least Squares (WLS)


Ordinary least squares (OLS) is the type of regression estimation that we have covered so far in class. OLS, while generally robust, can produce unacceptably high standard errors when the homogeneity of variance assumption is violated. Weighted least squares (WLS) encompases various schemes for weighting observations in order to reduce the effects of heteroscedasticity.

In WLS the goal is to minimize to following sum of squares:

The trick, of course, is determining the values for wi. According to Miller (1986), "If God or the Devil were willing to tell us the values of wi...", then the task would be easy.

A Simple Example

From Chatterjee & Price (1977) a study of 27 industrial companies of varying size, recorded the number of workers (x) and the number of supervisors (y). The model y = bo + b1x + e was examined.

OLS Analysis

use http://www.gseis.ucla.edu/courses/data/wls

graph y x

regress y x

  Source |       SS       df       MS                  Number of obs =      27
---------+------------------------------               F(  1,    25) =   86.54
   Model |  40862.6027     1  40862.6027               Prob > F      =  0.0000
Residual |   11804.064    25   472.16256               R-squared     =  0.7759
---------+------------------------------               Adj R-squared =  0.7669
   Total |  52666.6667    26  2025.64103               Root MSE      =  21.729

------------------------------------------------------------------------------
       y |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
       x |   .1053611   .0113256      9.303   0.000       .0820355    .1286867
   _cons |   14.44806   9.562012      1.511   0.143      -5.245273    34.14139
------------------------------------------------------------------------------

rvfplot, yline(0)

Remarks

  • There is some evidence of heteroscedasity in the x*y plot while the evidence is much stronger in the standardized residual vs x plot.
  • Since it appears that the variance is increasing as the value of x increases we will try weighting observations by dividing by x.
  • Here are the transformations:
          yt = y/x
          xt = 1/x
  • Now the model becomes yt = bo + b1xt + e.

    WLS Analysis

    generate yt = y/x
    
    generate xt = 1/x
    
    regress yt xt
    
      Source |       SS       df       MS                  Number of obs =      27
    ---------+------------------------------               F(  1,    25) =    0.69
       Model |  .000355828     1  .000355828               Prob > F      =  0.4131
    Residual |  .012842316    25  .000513693               R-squared     =  0.0270
    ---------+------------------------------               Adj R-squared = -0.0120
       Total |  .013198144    26  .000507621               Root MSE      =  .02266
    
    ------------------------------------------------------------------------------
          yt |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
    ---------+--------------------------------------------------------------------
          xt |   3.803296   4.569745      0.832   0.413       -5.60827    13.21486
       _cons |   .1209903   .0089986     13.445   0.000       .1024573    .1395233
    ------------------------------------------------------------------------------

    Remarks

  • The bo and b1 in the WLS are reversed from bo and b1 in the OLS.
  • Note that the standard errors of the coefficients are smaller for WLS than for OLS.

    generate p1 = 3.803296 + .1209903*x
    
    corr p1 y
    (obs=27)
    
                 |       p1        y
    -------------+------------------
              p1 |   1.0000
               y |   0.8808   1.0000
    
    
    rvfplot, yline(0)

    Remarks

    Inspection of the residual vs fitted (predicted) plot shows improvement in terms of heteroscedasticity.

    WLS the Easy Way

    Stata allows us to do WLS through the use of analytic weights, which can be included as part of the regress command.

    regress y x [aw = 1/x^2]
    (sum of wgt is  1.0470e-004)
    
      Source |       SS       df       MS                  Number of obs =      27
    ---------+------------------------------               F(  1,    25) =  180.78
       Model |  23948.6837     1  23948.6837               Prob > F      =  0.0000
    Residual |  3311.87511    25  132.475004               R-squared     =  0.8785
    ---------+------------------------------               Adj R-squared =  0.8737
       Total |  27260.5588    26  1048.48303               Root MSE      =   11.51
    
    ------------------------------------------------------------------------------
           y |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
    ---------+--------------------------------------------------------------------
           x |   .1209903   .0089986     13.445   0.000       .1024573    .1395233
       _cons |   3.803296   4.569745      0.832   0.413      -5.608271    13.21486
    ------------------------------------------------------------------------------
    

    The results obtained are the same as going through the process of transforming each of the variables.

    More About WLS

    There are many other possible weighting schemes:

    Bandaide Weighted Least Squares

    The method of bandaide weighted least squares is, at the moment, an untested approach to weighted least squares.

    bwls y x, groups(2) graph
    
    Bandaid WLS regression with 2 groups
    
    (sum of wgt is   6.1768e+01)
    
      Source |       SS       df       MS                  Number of obs =      27
    ---------+------------------------------               F(  1,    25) =  118.44
       Model |  23110.9331     1  23110.9331               Prob > F      =  0.0000
    Residual |   4878.1613    25  195.126452               R-squared     =  0.8257
    ---------+------------------------------               Adj R-squared =  0.8187
       Total |  27989.0944    26  1076.50363               Root MSE      =  13.969
    
    ------------------------------------------------------------------------------
           y |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
    ---------+--------------------------------------------------------------------
           x |   .1143247   .0105048     10.883   0.000       .0926896    .1359599
       _cons |   7.908293    6.25667      1.264   0.218      -4.977559    20.79415
    ------------------------------------------------------------------------------
    (27 real changes made)
    
    

    Weighted Least Squares -- A more taditional approach

    Weighted least squares using the wls0 command is a more traditional approach to WLS. With wls0 you can use any of the following weighting schemes: 1) abse - absolute value of residual, 2) e2 - residual squared, 3) loge2 - log residual squared, and 4) xb2 - fitted value squared.

    wls0 y x, wvar(x) type(abse)
    
    WLS regression -  type: proportional to abs(e)
    
    (sum of wgt is   3.0125e+00)
    
          Source |       SS       df       MS              Number of obs =      27
    -------------+------------------------------           F(  1,    25) =  169.18
           Model |  32092.5961     1  32092.5961           Prob > F      =  0.0000
        Residual |  4742.36886    25  189.694754           R-squared     =  0.8713
    -------------+------------------------------           Adj R-squared =  0.8661
           Total |   36834.965    26  1416.72942           Root MSE      =  13.773
    
    ------------------------------------------------------------------------------
               y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
               x |    .116966   .0089926    13.01   0.000     .0984454    .1354866
           _cons |   5.636913   5.184571     1.09   0.287    -5.040912    16.31474
    ------------------------------------------------------------------------------
    
    graph _wls_res y, yline(0)
    
    
    
    wls0 y x, wvar(x) type(e2)
    
    WLS regression -  type: proportional to e^2
    
    (sum of wgt is   1.4034e-01)
    
          Source |       SS       df       MS              Number of obs =      20
    -------------+------------------------------           F(  1,    18) =   89.58
           Model |  10656.2858     1  10656.2858           Prob > F      =  0.0000
        Residual |  2141.25367    18  118.958537           R-squared     =  0.8327
    -------------+------------------------------           Adj R-squared =  0.8234
           Total |  12797.5394    19  673.554707           Root MSE      =  10.907
    
    ------------------------------------------------------------------------------
               y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
               x |   .1193353   .0126085     9.46   0.000     .0928458    .1458249
           _cons |   3.781573   7.300479     0.52   0.611    -11.55617    19.11931
    ------------------------------------------------------------------------------
    
    graph _wls_res y, yline(0)
    
    
    
    wls0 y x, wvar(x) type(loge2)
    
    WLS regression -  type: proportional to log(e^2) 
    
    (sum of wgt is   3.4628e-01)
    
          Source |       SS       df       MS              Number of obs =      27
    -------------+------------------------------           F(  1,    25) =  171.57
           Model |   23622.504     1   23622.504           Prob > F      =  0.0000
        Residual |  3442.19706    25  137.687882           R-squared     =  0.8728
    -------------+------------------------------           Adj R-squared =  0.8677
           Total |   27064.701    26  1040.95004           Root MSE      =  11.734
    
    ------------------------------------------------------------------------------
               y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
               x |   .1237596   .0094485    13.10   0.000        .1043    .1432192
           _cons |   3.125702   5.074059     0.62   0.543    -7.324518    13.57592
    ------------------------------------------------------------------------------
    
    graph _wls_res y, yline(0)
    
    
    
    wls0 y x, wvar(x) type(xb2)
    
    WLS regression -  type: proportional to xb^2 
    
    (sum of wgt is   5.1285e-03)
    
          Source |       SS       df       MS              Number of obs =      27
    -------------+------------------------------           F(  1,    25) =  166.10
           Model |  28642.2361     1  28642.2361           Prob > F      =  0.0000
        Residual |  4310.96441    25  172.438576           R-squared     =  0.8692
    -------------+------------------------------           Adj R-squared =  0.8639
           Total |  32953.2005    26  1267.43079           Root MSE      =  13.132
    
    ------------------------------------------------------------------------------
               y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
               x |   .1189992   .0092333    12.89   0.000     .0999828    .1380156
           _cons |   4.993048   5.221103     0.96   0.348    -5.760015    15.74611
    ------------------------------------------------------------------------------
    
    graph _wls_res y, yline(0)
    
    
    
    


    UCLA Department of Education

    Phil Ender, 20Jun99