Files in this item
|(no description provided)|
|Title:||An application of item response theory to language testing: Model-data fit studies|
|Doctoral Committee Chair(s):||Cziko, Gary A.|
|Department / Program:||Education|
|Degree Granting Institution:||University of Illinois at Urbana-Champaign|
|Subject(s):||Education, Tests and Measurements|
|Abstract:||Even though the application of IRT to language testing has recently attracted much attention, no model-data fit research has been conducted to explore the appropriateness of IRT modeling in language testing. The tenability of the strong assumption of unidimensionality has not been studied systematically, and little is known concerning the effects of departure from unidimensionality on the estimation of parameters and on model fit. Furthermore, no study has examined the adequacy of the Rasch model which has been predominant in language testing.
The present study investigated the dimensionality of the reading and vocabulary sections of two widely-used English as a foreign language proficiency tests, the University of Cambridge First Certificate of English (FCE) and the Test of English as a Foreign Language (TOEFL). It also compared the relative model fit of three IRT models: 1, 2, and 3 parameter model. Dimensionality of the tests was investigated using Stout's method, factor analyses, and Bejar's method. Secondly, employing fit statistics, invariance check, and the residual analyses, the current study investigated the adequacy of the Rasch model, and the effects of multidimensionality on parameter estimation and model fit.
The results of this study suggest the following: (1) Even the TOEFL reading subtest, developed using the three-parameter IRT model, was multidimensional. This appears to be due to underlying factors associated with the reading passages. (2) The FCE reading and vocabulary subtest, based on the traditional British examination system, was found to be essentially unidimensional. (3) Bejar's approach to checking dimensionality appears to be inadequate in that the results differ across the 1, 2, and 3 parameter models. (4) The finding that the Rasch model clearly fails to provide an adequate fit for these data suggests that the prevailing use of the Rasch model in language testing needs to be re-evaluated. (5) The 3 parameter model fit the data only marginally better than did the 2 parameter model. This suggests that for language tests, the discrimination parameter is more significant than is the guessing parameter. (6) A moderate departure from unidimensionality does not appear to invalidate IRT modeling with the data. This finding suggests the possibility of more justified implementation of IRT modeling in language testing.
|Rights Information:||Copyright 1989 Choi, Inn-Chull|
|Date Available in IDEALS:||2011-05-07|
|Identifier in Online Catalog:||AAI9010829|