Files in this item
|(no description provided)|
|Title:||Assessing the Interpretive Component of Criterion-Referenced Test Item Validity|
|Department / Program:||Education|
|Degree Granting Institution:||University of Illinois at Urbana-Champaign|
|Subject(s):||Education, Tests and Measurements|
|Abstract:||The usefulness of an innovative testing technique for empirically detecting ambiguously worded or structurally deficient test items was explored. In addition to responding, eighty-nine undergraduate students in the architecture curriculum at the University of Illinois at Urbana-Champaign were asked to classify each test item on an electricity examination as having been generated from one or more of the topics representing the sub-unit headings from their text. These judgments were compared to their professor's "standard" judgments. For these data, an index of item-domain divergence in perceived item meaning between examinees on the average and the professor as content specialist was computed. Second, the amount of unexplained variation in response data not accounted for by classification data was computed for each "standard" domain label for each item. Third, the mean, variance, and n of total test scores for examinees was reported for each of the following examinee groups for each item: responded incorrectly, same classification; responded correctly, same classification; responded incorrectly, different classification; and responded correctly, different classification. Finally, z scores were computed for the difference between the mean total test scores for each of the above cells as compared to the mean total test score for all examinees for each "standard" domain for each item.
Examinee perceptions of the topic(s) that generated items in relation to the professor's judgment coupled with responses to the items provided estimates of the extent to which responses to items contained partial knowledge or careless errors (responded incorrectly, same classification), were valid (responded correctly, same classification), were misinterpreted (responded incorrectly, different classification), or were testwise (responded correctly, different classification). The above new measures provide a means for detecting ambiguous items not otherwise detectable using biserial correlations based upon response data only. Items deemed ambiguous by these exploratory procedures were compared with items indicated as being ambiguous by examinees in taped interviews during the week following the examination. Limitations of the study such as the use of the pre-existing sub-unit headings as domain labels which affected the divergence index and avenues for future research are discussed.
Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 1980.
|Date Available in IDEALS:||2014-12-12|