Files in this item



application/pdfCHOE-DISSERTATION-2017.pdf (3MB)
(no description provided)PDF


Title:Advancements in test security: preventive test assembly methods and change-point detection of compromised items in adaptive testing
Author(s):Choe, Edison M
Director of Research:Chang, Hua-Hua
Doctoral Committee Chair(s):Chang, Hua-Hua
Doctoral Committee Member(s):Zhang, Jinming; Köhn, Hans-Friedrich; Anderson, Carolyn J.; Culpepper, Steven A.
Department / Program:Psychology
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):computerized adaptive testing
multistage testing
test security
test validity
automatic test assembly
item selection
item exposure
test overlap
response time
change-point detection
item compromise
item preknowledge
Abstract:Chapter 1: Test security is as longstanding as testing itself. The broad issues are introduced very briefly, then the specific matters pertaining to computerized adaptive testing (CAT) are discussed. The framework of CAT is established in detail, then the vital security concerns of item exposure and test overlap are given a rigorous theoretical treatment. Ultimately, the asymptotic distribution of mean test overlap rate under random item selection is proven. Chapter 2: The assembly of linear test forms has traditionally been performed manually by test development specialists. However, manual test assembly (MTA) is a labor and time intensive process that generally produces suboptimal forms, which is why the task has increasingly been delegated to computers running automated algorithms. The standard paradigm of automatic test assembly (ATA) is mixed-integer linear programming (MILP), a mathematical optimization technique that allows the specification of desired test characteristics as a system of linear inequalities to be solved computationally. MILP with the conventional branch-and-bound algorithm guarantees an exact solution whenever feasible, but infeasibility and long computational times are two common difficulties, especially with large item pools and complex constraints such as selecting item sets or controlling for test overlap. In order to mitigate these complications, this chapter proposes a modified ATA procedure called common block assembly (CBA), which uses a stratified shadow-test approach to construct item blocks that are subsequently pieced together into full forms. Based on a previously operational item pool with an extended set of constraints, CBA can effortlessly obtain optimal solutions that outperform MTA in terms of overall test quality. Chapter 3: Whereas multistage testing (MST) typically routes to a preassembled module, on-the-fly MST (OMST) adaptively assembles a module at each stage in real-time. Although OMST produces more individualized forms with finer measurement precision, imposing exposure control and nonstatistical constraints remain a challenge. The script method is introduced as a simple yet effective way to overcome these issues. Chapter 4: Despite common operationalization, measurement efficiency of CAT should not only be assessed in terms of the number of items administered but also the time it takes to complete the test. To this end, a recent study introduced a novel item selection criterion that maximizes Fisher information per unit of expected response time, which was shown to effectively reduce the average completion time for a fixed-length test with minimal decrease in the accuracy of ability estimation. As this method also resulted in extremely unbalanced exposure of items, however, a-stratification with b-blocking was recommended as a means for counterbalancing. Although exceptionally effective in this regard, it comes at substantial costs of attenuating the reduction of average testing time, increasing the variance of testing times, and further decreasing estimation accuracy. Therefore, this chapter investigates several alternative methods for item exposure control, of which the most promising is a simple modification of maximizing Fisher information per unit of centered expected response time. The key advantage of the proposed method is the flexibility in choosing a centering value according to a desired distribution of testing times and level of exposure control. Moreover, the centered expected response time can be exponentially weighted to calibrate the degree of measurement precision. The results of extensive simulations, with item pools and examinees that are both simulated and real, demonstrate that optimally chosen centering and weighting values can markedly reduce the mean and variance of both testing times and test overlap, all without much compromise in estimation accuracy. Chapter 5: Item compromise persists in undermining the integrity of testing, even secure administrations of CAT with sophisticated item exposure controls. In a novel approach to addressing this perennial test security issue, a recent article introduced a sequential procedure for detecting compromised items in which a significant increase in the proportion of correct responses for each item in the pool is statistically monitored after each exposure. In addition to actual responses, response times are valuable information with tremendous potential to reveal items that may have been leaked. Specifically, examinees that have preknowledge of an item would presumably respond more quickly to it than those who do not. Therefore, this chapter proposes several augmented methods for the detection of compromised items, all involving simultaneous monitoring of changes in both the proportion correct and average response time for each operational item in the pool. Simulation results indicate that the consideration of response times can afford marked improvements over the analysis of responses alone. Chapter 6: In direct continuation of Chapter 5, three additional methods of item compromise detection are examined: 1) extension of comparing two proportions, including binomial and Fisher's exact tests; 2) generalized likelihood ratio test (GLRT); 3) nonparametric techniques comparing empirical distribution functions (EDFs), specifically the Kolmogorov-Smirnov (KS) and Kuiper's tests. According to simulation results, GLRT in particular is demonstrated to be quite capable of detecting compromised items quickly and accurately, even with only a small chance of an examinee having preknowledge. Chapter 7: Test security is ultimately a matter of test validity. Thus, the body of research in this thesis seeks to protect validity by improving security from a psychometric perspective. Needless to say, much work still remains in advancing the field to better inform practice.
Issue Date:2017-05-01
Rights Information:Copyright 2017 Edison M. Choe
Date Available in IDEALS:2017-09-29
Date Deposited:2017-08

This item appears in the following Collection(s)

Item Statistics