A Design for Comparing CTT and IRT in Test Assembly, Scoring, and Argumentation: Differences Among Reliability, Information, and Validation

Abdulelah Mohammed Alqarni*
Department of Psychology and Counselling, Faculty of Educational Graduate Studies, King Abdulaziz University, Jeddah, Saudi Arabia.
Periodicity:August - October'2019
DOI : https://doi.org/10.26634/jpsy.13.2.16084

Abstract

This study compared the psychometric properties of reliability in Classical Test Theory (CTT), item information in Item Response (IRT), and validation from the perspective of modern validity theory for the purpose of bringing attention to potential issues that might exist when testing organizations use both test theories in the same testing administration. It was found that reliability, instead using corrected item-total test score correlations, and item information functions are only grossly similar and their conjoint use should be compartmentalized in the processes of test assembly, pre-testing, and scoring broadly speaking. For validity, only minor differences attributable to scoring processes are conceivable, but the main problem is that too much subjectivity by way of different arguments being constructed using the same test and test scores engenders a lack of consensus in the meaning of arguments. A checklist is presented for consideration by test validators that will produce greater consensus of arguments by improving argument comprehensiveness.

Keywords

Classical Test Theory (CTT), Item Response Theory (IRT), Scoring, Reliability, Information, Item Information Functions (IIFs), Item Characteristic Curves (ICCs) and Validation

How to Cite this Article?

Alqarni, A. M. (2019). A Design for Comparing CTT and IRT in Test Assembly, Scoring, and Argumentation: Differences Among Reliability, Information, and Validation. i-manager’s Journal on Educational Psychology, 13(2), 1-9. https://doi.org/10.26634/jpsy.13.2.16084

References

[1]. Angoff, W. H. (1988). Validity: An evolving concept. In H. Wainer & H. I. Braun (Eds.), Test Validity (pp. 19–32). Hillsdale, NJ: Erlbaum.
[2]. Biddle, D. (2006). Adverse impact and test validation: a practitioner's guide to valid and defensible employment testing. Gower Publishing, Ltd.
[3]. Borsboom, D., Cramer, A., Kievit, R., Scholten, A., & Franic, S. (2009). The end of construct validity. In R. Lissitz (Ed.), The Concept of Validity (pp. 135-170). Charlotte, NC: Information Age.
[4]. Cowell, W. R. (1991). A procedure for estimating the conditional standard errors of measurement for GRE General and Subject Tests (ETS Research Report No. 91- 25). Princeton, NJ: Educational Testing Service. https://doi.org/10.1002/j.2333- 8504.1991.tb01392.x.
[5]. Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16(3), 297-334.
[6]. Cronbach, L. J. (1971). Test validation. In R. L. Thorndike (Ed.), Educational Measurement (2nd Ed.) (pp. 443-507). Washington, DC: American Council on Education.
[7]. Fan, X. (1998). Item response theory and classical test theory: An empirical comparison of their item/person statistics. Educational and Psychological Measurement, 58(3), 357-381. https://doi.org/10.1177/0013164498058 003001
[8]. Hambleton, R. K., & Swaminathan, H. (1985). Item Response Theory: Principles and Applications. Boston, MA: Kluwer.
[9]. House, E. R. (1980). Evaluating with Validity. Beverly Hills, CA: Sage.
[10]. Kane, M. T. (2006). Validation. In R. Brennan (Ed.), Educational Measurement (4th Ed.) (pp. 17-64). Westport, CT: American Council on Education and Praeger.
[11]. Kane, M. T. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50(1), 1-73. https://doi.org/10.1111/jedm.12000
[12]. MacDonald, R. P. (1999). Test Theory: A Unified Treatment. Mahwah, NJ: Erlbaum.
[13]. Messick, S. (1975). The standard problem: Meaning and values in measurement and evaluation. American Psychologist, 30(10), 955-966. http://dx.doi.org/10.10 37/0003-066X.30.10.955
[14]. Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational Measurement (3rd Ed.) (pp. 13-103). New York, NY: American Council on Education and Macmillan.
[15]. Mollenkopf, W. G. (1949). Variation of the standard error of measurement. Psychometrika, 14(3), 189-229. https://doi.org/10.1007/BF02289153
[16]. Rogosa, D., & Ghandour, G. (1991). Statistical models for behavioral observations. Journal of Educational Statistics, 16(3), 157-252. https://doi.org/ 10.3102/10769986016003157
[17]. Toulmin, S. (1958). The Uses of Argument. UK: Cambridge University Press.
[18]. Zwick, R. (2018). Admission testing in college and graduate education. In C. Secolsky & D. Brian Denison (Eds.), Handbook on Measurement, Assessment, and Evaluation in Higher Education (2nd Ed.) (pp. 271-294). New York, NY: Routledge.
If you have access to this article please login to view the article or kindly login to purchase the article

Purchase Instant Access

Single Article

North Americas,UK,
Middle East,Europe
India Rest of world
USD EUR INR USD-ROW
Pdf 35 35 200 20
Online 35 35 200 15
Pdf & Online 35 35 400 25

Options for accessing this content:
  • If you would like institutional access to this content, please recommend the title to your librarian.
    Library Recommendation Form
  • If you already have i-manager's user account: Login above and proceed to purchase the article.
  • New Users: Please register, then proceed to purchase the article.