Ed230B/C

Regression Diagnostics


Descriptive Statistics and Exploratory Data Analysis

Of course, any data analysis would begin with descriptive statistics and exploratory data analysis. Included in these analyses would be distribution plots (histograms, stem plots, or kdensity plots).

Plot Dependent Variable vs Predicted Values

One visual check of the goodness-of-fit of the model is to plot the values of the dependent variable versus the predicted values. When there is perfect prediction the plot will be a diagonal line.

Residual Analysis

Residuals are the difference between the observed score and the predicted score.

Residuals come in three varieties:

  1. Raw Residuals: The difference between the raw observed score and the predicted score, as given in the formula above. Often denoted e or resid.
  2. Standardized Residuals: These are the raw residuals divided by the standard error of estimate. Can be denoted rstan or zresid.
  3. Studentized Residuals: These are raw residuals divided by the standard error of the residual with that case deleted. These are sometimes called studentized deleted residuals or studentized jackknifed residuals. Can be denoted rstu.
Outliers

Outliers are cases with large residuals.

Plotting Residuals

In General: Residual Plots

The picture should look something like this-

DV vs Predictors

Overall Plot of Residuals

Index Plot -- Plot of Residuals by Case

Time Sequence Plot

1. Watch out for situations in which variance increases with time; try Weighted Least Squares (W.L.S.)

2. This pattern could indicate that a linear term is missing from the model.

3. This pattern could indicate that both a linear and a quadratic term in time are missing from the model.

Plot Residuals versus Predicted (Fitted) Scores

1. Watch out for situations in which variance is not constant as assumed (may need W.L.S. or a transformation of Y).

2. This pattern could indicate that a variable is missing from the model (Also caused by wrongly omitting intercept term in model).

3. An additional term is needed in the model, the square of a variable or an interaction (again maybe transformation of Y).

Residual Plot versus Predictors

1. May need W.L.S. or a transformation of Y.

2. Perhaps errors in calculation?.

3. Need an additional term in X (X2) or transformation of Y..

Go to the next screen.


UCLA Department of Education

Phil Ender, 15Jun98