Ed230B/C

Ed230B Assignment #2 - Multiple Regression

Part I: Multiple Regression

You should use either one of the datasets provided by the instructor or a dataset of your own choosing (with approval of the instructor). For part I of this assignment, you should use one half of the cases in the dataset (ideally, the cases will be randomly selected). For part II, you will use the other half of the cases.

You should choose a few variables to focus on for this assignment. Your report of the analysis should be written in the style of a technical report--that is, the write--up should be heavily concerned with the substantive meaning of the results. You should, however, include the details of necessary assumptions, calculations, and statistical procedures. The report should include the following points:

  1. Briefly describe the study, the variables measured, and any special problems such as missing data, etc. Include your descriptive statistics: means, standard deviations, variances, and correlation matrix.
  2. Discuss the reason for investigating the variables you have chosen.
  3. Do a multiple regression with the variables you have chosen. Provide the necessary tables and figures to report your results and to back up the decisions you have made. Include your examination of residuals for evidence of normality, linearity, equality of variance, outliers, and adequacy of model. Redo the analysis as many times as necessary to account for transformations or deletion of outliers.
  4. Attach annotated printouts showing that you understand and have examined the entire output.
  5. Explain the assumptions underlying multiple regression. Discuss the degree to which your data satisfied these assumptions. Discuss how you tested for or investigated multicollinearity. Would some approach other than regression analysis have been useful in analyzing these data.
  6. Discuss the results and draw conclusions about the basic questions you set out to answer. Discuss the meaning (both statistical and practical) of each of the coefficients (both raw and standardized) in your model.

Part II: Cross Validation

  1. Using the regression coefficients derived in Part I, compute predicted scores for each case in the second dataset.
  2. Compute the correlation between the observed outcome scores and the predicted scores. Compare this value with the multiple correlation coefficient obtained in Part I. Explain the difference, if any.


Graduate School of Education & Information Studies

Phil Ender, 4Jan99

updated