Ed230B/C

Data Transformation


Purpose of Transformations

  1. To linearize regression model.
  2. To stabilize variance (reduce heterogeneity of variance, "heteroscedasticity").
  3. To normalize variables.
  • Some transformations will serve more than one purpose. For example, a transformation that linearizes a variable may also help to normalize it.

    Transformations May be Necessary Due to:

    Variables to be Transformed

    Major Drawbacks

    Log Transformation

    1. To linearize regression model with consistently increasing slope.

    2. Stabilize variance when variance of residuals increases markedly with increasing Y.

    3. To normalize Y when distribution of residuals is positively skewed.

    Stata Example

    
    use http://www.gseis.ucla.edu/courses/data/lntrans, clear
    
    scatter y x
    
    generate z = log(y)
    
    scatter z x
    
    regress z x
    
      Source |       SS       df       MS                  Number of obs =      50
    ---------+------------------------------               F(  1,    48) = 2916.35
       Model |  365.874096     1  365.874096               Prob > F      =  0.0000
    Residual |  6.02190025    48  .125456255               R-squared     =  0.9838
    ---------+------------------------------               Adj R-squared =  0.9835
       Total |  371.895996    49  7.58971421               Root MSE      =   .3542
    
    ------------------------------------------------------------------------------
           z |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
    ---------+--------------------------------------------------------------------
           x |   .9417895   .0174395     54.003   0.000        .906725     .976854
       _cons |    .906511   .1082093      8.377   0.000       .6889417     1.12408
    ------------------------------------------------------------------------------
    
    
    predict p
    
    scatter z p x, msym(O i) con(. l) sort
    
    generate p2 = exp(p)
    
    graph y p2 x, msym(O i) con(. l) sort
    
    /* now transform x instead of y */
    
    generate xt = exp(x)
    
    scatter y xt
    
    regress y xt
    
      Source |       SS       df       MS                  Number of obs =      50
    ---------+------------------------------               F(  1,    48) =  650.09
       Model |  4.3685e+09     1  4.3685e+09               Prob > F      =  0.0000
    Residual |   322552812    48  6719850.24               R-squared     =  0.9312
    ---------+------------------------------               Adj R-squared =  0.9298
       Total |  4.6911e+09    49  95736235.2               Root MSE      =  2592.3
    
    ------------------------------------------------------------------------------
           y |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
    ---------+--------------------------------------------------------------------
          xt |   1.409637   .0552866     25.497   0.000       1.298476    1.520799
       _cons |   493.3881    414.134      1.191   0.239       -339.284     1326.06
    ------------------------------------------------------------------------------
    
    rvfplot, yline(0) xlabel ylabel
    
    Square Root (SQRT) Transformation

    Used to stabilize variance when proportional to the mean of Y; especially when Y approximates a Poisson distribution.

    Reciprocal Transformation

    To stabilize variance when proportional to the 4th power of mean of Y, i.e., huge increase in variance above some threshold of Y. Purpose is to mimnimize effect of large values of Y. Transformed large Ys will be close to zero, thus large increases in Y will result in only trivial decreases in Y'.

    Square Transformation

    1. Linearize when X vs Y is curvilinear downward, i.e., slope decreases as X increases..

    2. Stabilize variance when it decreases with the mean of Y.

    3. Normalize Y when distribution of residuals is negatively skewed.

    Go to the next page


    UCLA Department of Education

    Phil Ender, 18dec99