
Measurement in Education:Underlying Theory
Education 211A
Spring 2004
Wednesdays 9 - 1
GSE&IS
Building, Room # 325
Professor Noreen Webb
2019C Moore Hall
310-825-1897
email: Webb@ucla.edu
Teaching Assistant:
Felipe Martinez
Office Hours: Tuesday 11-1
email: jfmtz@ucla.edu
The purposes of this course are to teach (a) the concepts of reliability and validity, (b) the mathematical models underlying these concepts, and (c) the application of these concepts to problems in measurement. This course combines theory and practice. The objectives for all students are to understand concepts in reliability and validity, know how to apply specific formulas and approaches, and to know when to apply them. The first part of the course focuses primarily on classical test theory, both theory and implications for test construction. The second part of the course introduces generalizability theory and its application to educational testing. The final part covers topics in validity of measurements. You are assumed to have had (a) statistics through simple regression and two-way analysis of variance, (b) ninth grade algebra, and (c) a course in psychological testing. (See the instructor if you do not have one or more of these.)
*Allen, M. J., & Yen, W. M. (1979). Introduction to measurement theory. Monterey, Calif.: Brooks-Cole.
Brennan, R. L. (1992). Elements of generalizability theory. Iowa City: ACT Publications.
Crocker, L., & Algina, J. (1986). Introduction to classical and modern test theory. New York: Hott, Rinehart, and Winston.
Magnusson, D. (1967). Test theory. Reading, Mass.: Addison-Wesley.
* Shavelson, R.J. & Webb, N.M. (1991). A Primer on Generalizability Theory. Sage Publications.
Thorndike, R. L. (1982). Applied psychometrics. Boston, Mass.: Houghton-Mifflin.
(Books marked with an "*" were ordered for the course. Nonstarred books are alternative or supplemental sources.)
Class
Schedule
Codes
A&Y Allen & Yen
C&A Crocker & Algina
S&W Shavelson & Webb (1991)
Note: Multiple readings are provided below as alternatives. Most cover the same or similar material.
Date Topics and Readings (Optional readings in parentheses)
Week 1 - Week 3 Introduction to Classical Test Theory:
Development of Theory and Reliability
Coefficient
A&Y Ch. 1, 3-4
C&A Ch. 1, 6
Spearman-Brown Prophesy Formula Types of Reliability Coefficients
A&Y Ch. 3-4
C&A Ch. 6, 7
ANOVA Approach to Classical Reliability
Computer data analysis assignment #1
Week 4 - Week 7 Introduction to G Theory
Two-Facet Designs
Mixed Models
Decision Studies
C&A, Ch. 8, 9
Shavelson & Webb, Chapters 1-9
Week 8 - Week 9 Introduction to Validity Theory
A&Y Ch. 5
C&A Ch. 10
Criterion Related Validity
Construct Validity
Week 10 Special Topics (Time permitting):
Examining Test Items
A&Y Ch. 6
C&A Ch. 16
Scaling
A&Y Ch. 9
C&A Ch. 3
Test Equating
C & A Ch. 20
There will be three types of course assignments: Computer assignments, a midterm exercise, and a final exercise.
1. Computer data analysis assignments. There will be three computer data analysis assignments. Instructions for these assignments appear later in this syllabus. Students are encouraged to use their own data for the assignments (please check with the instructor to make sure they are suitable). For students without access to their own data, datasets will be made available.
Due dates for assignments are: 1 (week 4), 2 (week 7), 3 (week 10). Any changes in these due dates will be announced in class. Students are encouraged to work cooperatively in small groups on these assignments. The names of all the participants on a given assignment should appear on the report.
2. Take-home midterm exercise. The midterm exercise will cover topics that are discussed in class prior to the examination in survey fashion. Students will work individually. This exercise is open-book: students may consult any material they wish (class notes, readings, etc.).
3. Take-home final exercise. The final exercise will cover topics from the whole course, in a survey fashion. The exercise is open-book (any materials may be consulted). Students will work individually on the exercise.
We provide multiple datasets for you to use for the computer data analysis assignments. However, we encourage you to use data of your own. If you have alternative datasets that you would like to use for one or more of the assignments, please consult the instructor on the appropriateness and feasibility of using your data for the assignments.
Self concept and mathematics achievement dataset (Assignments 1 and 3). The “Bolus” data were collected in February 1980, as part of Roger Bolus' dissertation. Bolus collected measures of general academic self-concept and mathematics achievement.
Creativity dataset (Assignments 1 and 3). The “Creativity” data were collected as part of a validation study for Abedi’s Test of Creativity (ATC). For this study, measures of general academic achievement (Basque language, English language, Spanish language, math, natural sciences and social sciences), family background (SES, number of siblings, parent education) and several measures of creativity (Abedi’s Test of Creativity, ATC; Vila and Azmundi’s test, VT; Torrance Test of Creativity, TTC; and teachers’ rating of creativity) were collected. For ATC and VT, item-level data are available. This study is discussed in an article by Azmundi, Villa, & Abedi, 1996).
Science achievement test (Assignments 2 and 3). Data on science achievement come from a study by Webb, Schlackman & Sugrue, (2000). In this study, the authors investigated the importance of occasion as a hidden sources of error variance in (a) estimates of dependability (generalizability) of science assessment scores and (b) the interchangeability of science test formats. Two science tests were developed to measure eighth-grade students’ knowledge of concepts related to electricity and electric circuits: a hands-on-assessment, which provided students with equipment to manipulate, and an analogous paper-and-pencil version. Students were administered both tests on two occasions, approximately one month apart. Student responses were scored by two raters. Results of the univariate generalizability showed that explicitly recognizing occasion as a facet of error variance altered the interpretation about the substantial sources of error in the measurement and gave lower estimates of the dependability of science scores.
Military on-the-job performance measurement (Assignments 2 and 3). The Armed Services, in cooperation with the Department of Defense, has undertaken a research and development effort to investigate the feasibility of (1) measuring on-the job performance and (2) using various criterion measures in establishing military enlistment standards. The Navy collected various information about job performance: a hands-on performance test for Machinist Mates, a paper-and-pencil simulation test, and job task performance ratings.
You may use the same dataset for multiple assignments, or you may use a different dataset for each assignment.
The variables and the format of the datasets will be provided in separate attachments.
Alwin, D. F. Approaches to the interpretation of relationships in the multitrait-multimethod matrix. In H. L. Costner (Ed.), Sociological methodology 1973-74. San Francisco: Jossey-Bass, 1974, 79-105. (MTMM validity}
Althauser, R. P. Inferring validity from a multitrait-multimethod matrix: Another assessment. In H. L. Costner (Ed.), Sociological methodology 1973-74. San Francisco: Jossey-Bass, 1974, 105-127. (MTMM validity)
Bentler, P. The interdependence of theory, methodology and empirical data: Causal modeling as an approach to construct validity. In D. B. Kandel (Ed.), Longitudinal research on drug use: Empirical findings and methodological issues. 1978.
Boruch, R. F., Larkin, J. D., Wolins, L., & Mackinney, A. C. Alternative methods of analysis: Multitrait-multimethod data. Educational and Psychological Measurement, 1970, 30, 833-853. (MTMM)
Campbell, D. T., & Fiske, D. W. Validation by the multitrait-multimethod matrix. Psychological Bulletin, 1959, 56, 81-105. (Also in Mehren & Ebel and in Payne & McMorris readers.) (MTMM validity)
Cardinet, J., Tourneur, Y., & Allan, L. The symmetry of generalizability theory: Applications to educational measurement. Journal of Educational Measurement, 1976, 13, 119-136.
Cleary, T. A. Test bias: Prediction of grades of Negro and white students in integrated colleges. JEM, 1968, 5, 115-124. (Selection bias)
Cronbach, L. J. Essentials of psychological testing. New York: Harper & Row, 1970.
de Gruijter, D. N. M., & Van der Kamp, L. J. T. (Eds.). Advances in psychological and educational measurement. New York: Wiley, 1976.
Erlich, 0., & Shavelson, R. J. The Application of Generalizability Theory to the Study of Teaching.
Erlich, 0., & Shavelson, R. J. The search for correlations between measures of teacher behavior and student achievement: Measurement problem, conceptualization problem, or both? JEM, 1978, 15, 77-89.
Feldt, L. S., & Brennan, R. L. (1989). Reliability. In R. L. Linn (Ed.), Educational measurement, 3rd edition. (pp. 105-146). Phoenix, AZ: American Council on Education/Macmillan Publishing.
Glaser, R. Instructional technology and the measurement of learning outcomes. American Psychologist, 1963, 18, 519-521.
Hambleton, R. K. (1989). Principles and selected applications of item response theory. In R. L. Linn (Ed.), Educational measurement, 3rd edition. (pp. 147-200). Phoenix, AZ: American Council on Education/Macmillan Publishing.
Hambleton, R. K., & Novick, M. R. Toward an integration of theory and method for criterion-referenced tests. JEM, 1973, 10. 159-170. (CRM)
Hambleton, R. K., Swaminathan, H., Algina, J., & Coulson, D. B. Criterion-referenced testing and measurement. A review of technical issues and developments. Review of Educational Research, 1978, 48, 1-48.
Hammond, K. R., Hamm, R. M., & Grassia, J. Generalizing over conditions by combining the multitrait-multimethod matrix and the representative design of experiments. Psychological Bulletin, 1986, 100, 257-269.
Hively, N., Patterson, H. L., & Page, S. H. A "Universe-Defined" system of arithmetic achievement tests. JEM, 1968, 5.
Kane, M. T., Gillmore, G. M., & Crooks, T. J. Student evaluations of teaching. The generalizability of class means. JEM, 1976, 13, 171-183. (Generalizability)
Lee, R., Malone, M., & Greco, S. Multitrait-Multimethod-Multirater Analysis of Performance Ratings for Law Enforcement Personnel. Journal of Applied Psychology, 1981, 66, 625-632.
Livingston, S. Criterion-referenced applications of classical test theory. JEM, 1972, 9, 13-25. (CRM)
Lord, F. M., & Novick, M. Statistical theories of mental test scores. Reading, Mass.: Addison-Wesley, 1968.
McMorris, R. F. Evidence of several approximations for commonly used measurement statistics. JEM, 1972, 9, 113-122. (Methods of reliability)
Mehrens, W., & Ebel, R. (Eds.). Principles of educational and psychological measurements. New York: Rand McNally, 1967. (Out of print)
Meskauskas, J. A. Evaluation models for criterion-referenced testing. Review of Educational Research, 1976, 46. 133-158. (CRM)
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement, 3rd edition. (pp. 13-104). Phoenix, AZ: American Council on Education/Macmillan Publishing.
Schmitt, N., & Stults, D. M. Methodology review: Analysis of multitrait multimethod matrices. Applied Psychological Measurement. 1986, 10, 1-22.
Shavelson, R. J., Block, J., & Ravitch, M. Criterion-referenced testing: Comments on reliability. JEM, 1972, 9, 133-140. (CRM)
Shavelson, R. J., Hubner, J. J., & Stanton, G. C. Self-concept: Validation of construct interpretations. Review of Educational Research, 1976, 46. 407-441.
Shavelson, R. J., & Stanton, G. C. Construct validation: Methodology and application to three measures of cognitive structure. JEM, 1975, 12. 67-86. (Multitrait-multimethod validation)
Shavelson, R. J., Webb, N. M., & Burnstein, L. The measurement of teaching. In M.C. Wittrock (Ed.), Handbook of Research on Teaching (Third Edition). Macmillan, 1985.
Shavelson, R. J., & Webb, N. M. (1981). Generalizability Theory: 1973-1980. British Journal of Mathematical and Statistical Psychology, 34, 133-166.
Subkoviak, M. J. Estimating reliability from a single administration of a mastery test. JEM, 1976, 13, 265-276. (CRM}
Subkoviak, M. J. The reliability of mastery classification decisions. Paper presented at the annual meeting of the American Educational Research Association, San Francisco, April 1979.
Swaminathan, H., Swaminathan, R. K., Hambleton, R. K., & Algina, J. A Bayesian decision theoretic procedure for use with criterion-referenced tests. JEM, 1975, 12, 87-98. (CRM)
Travers, K. J. Correction for Attenuation: A Generalizability Approach Using Components of Variance.
Warm, T. A Primer of Item Response Theory.
Additional Books and Readings
Ahmann, J. S., & Glock, M. D. Evaluating student progress: Principles of tests and measurements (6th ed.). Boston, Mass.: Allyn & Bacon, 1981.
Gronlund, N. E. Measurement and evaluation in teaching. New York: MacMillan Publishing Co., 1981.
Sax, G., Principles of educational and psychological measurement and evaluation. Belmont, Calif.: Wadsworth Publishing Co., 1980.
Thorndike, R. L. Applied psychometrics. Boston, Mass.: Houghton-Mifflin, 1982.
Thorndike, R. L., & Hagen, E. P. Measurement and evaluation in psychology and education (4th ed.). New York: Wiley, 1977.