Measurement theory in language testing: Past traditions and current trends

Mohammad Ali Salmani Nodoushan*
Adjunct Assistant Professor, University of Tehran, Kish International Campus, Iran.
Periodicity:August - October'2009
DOI : https://doi.org/10.26634/jpsy.3.2.1023

Abstract

A good test is one that has at least three qualities: reliability, or the precision with which a test measures what it is supposed to measure; validity, i.e., if the test really measures what it is supposed to measure; and practicality, or if the test, no matter how sound theoretically, is practicable in reality. These are the sine qua non for any test including tests of language proficiency. Over the past fifty years, language testing has witnessed three major measurement trends: Classical Test Theory (CTT), Generalizability Theory (G-Theory), and Item Response Theory (IRT). This paper will provide a very brief but valuable overview of these trends. It will then move onto a brief consideration of the most recent notion of Differential Item Functioning (DIF). It will finally conclude that the material discussed here is applicable not only to language tests but also to tests in other fields of science.

Keywords

Testing, Reliability, Validity, Geralizability Theory (G-Theory), Item Response Theory (IRT), Classical Test Theory (CTT) Differential Item Functioning (DIF).

How to Cite this Article?

Dr. Mohammad Ali Salmani Nodoushan (2009). Measurement theory in language testing: Past traditions and current trends. i-manager’s Journal on Educational Psychology, 3(2), 1-12. https://doi.org/10.26634/jpsy.3.2.1023

References

[1]. Alderson, J. C., Clapham, C. & Wall, D. (1995). Language test construction and evaluation. Cambridge: Cambridge University Press.
[2]. Bachman, L. F. (1990). Fundamental considerations in language testing. Oxford: OUP.
[3]. Brennan, R. L. (1984). Estimating the dependability of scores. In R. A. Berk (Ed.), A guide to criterion-referenced test construction (pp. 292-334). Baltimore, Md.: The Johns Hopkins University Press.
[4]. Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 292-334.
[5]. Cronbach, L. J. (1984). Essentials of psychological testing (4 th ed.). New York: Harper and Row.
[6]. Cronbach, L. J., Geleser, G. C., Nanda, H., & Rajaratnam, N. (1972). The dependability of behavioral measurement: Theory of generalizability for scores and profiles. New York: John Wiley.
[7]. Ebel, R. L. (1951). Estimation of the reliability of ratings. Psychometrika, 16, 407-424.
[8]. Farhady, H. (1980). Justification, development, and validation of functional language tests. Unpublished doctoral dissertation, University of California at Los Angeles.
[9]. Fisher, R. A. (1925). Statistical methods for research workers. London: Oliver & Bond.
[10]. Kane, M. T. (1982). A sampling model for validity. Applied Psychological Measurement, 6, 125-160.
[11]. Kane, M. T., & Brennan, R. L. (1980). Agreement coefficients as indices of dependability for domainreferenced tests. Applied Psychological Measurement, 4, 219-240.
[12]. Lindquist, E. F. (1953). Design and analysis of experiments in psychology and education. Boston: Houghton Mifflin.
[13]. Lord, F. M. (1957). Do tests of the same length have the same standard error of measurement? Educational and Psychological Measurement, 22, 511-521.
[14]. Messick, S. (1988). Validity. In L. R. Linn (Ed.), Educational measurement (pp. 13-103). New York: American Council on Education/McMillan.
[15]. Rasch, G. (1980). Probabilistic models for some intelligence and attainment tests. Chicago: University of Chicago Press.
[16]. Shavelson, R., & Webb, N. (1981). Generalizability theory: 1973-1980. British Journal of Mathematical and Statistical Psychology, 34, 133-166.
[17]. Shavelson, R. J., Webb, N., & Rowley, G. L. (1989). Generalizability theory. American Psychologist, 44, 922- 932.
[18]. Spearman, (1910). Correlation calculated from faulty data. British Journal of Psychology, 3, 271-295.
[19]. University of Sothern Florida. Item Response Theory. Paper retrieved from: http://luna.cas.usf.edu/~mbrannic / files/pmet/irt.htm.
[20]. Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (DIF): Logistic regression modeling as a unitary framework for binary and likert-type (ordinal) scores. Ottawa, ON.: Directorate of Human Resources Research and Evaluation, Department of National Defense.
If you have access to this article please login to view the article or kindly login to purchase the article

Purchase Instant Access

Single Article

North Americas,UK,
Middle East,Europe
India Rest of world
USD EUR INR USD-ROW
Online 15 15

Options for accessing this content:
  • If you would like institutional access to this content, please recommend the title to your librarian.
    Library Recommendation Form
  • If you already have i-manager's user account: Login above and proceed to purchase the article.
  • New Users: Please register, then proceed to purchase the article.