|
Many comparison studies have been conducted to investigate efficiency of the statistical procedures across computerized adaptive testing (CAT) and computerized multistage testing (ca-MST). Although it is directly related to validity, score interpretation and test fairness, non-statistical issues of adaptive tests, such as content balancing, have not been given more attention. It is consistently asserted in several studies that major advantage of ca-MST is that it controls for content better than CAT. Yet, the literature does not contain a study that specifically compares CAT with ca-MST under varying levels of content constraints to verify this claim. A simulation study was conducted to explore the precision of test outcomes across CAT and ca-MST when the number of different content areas was varied across a variety of test lengths. One CAT and two ca-MST designs (1-3 and 1-3-3 panel designs) were compared across several manipulated conditions including total test length (24-item and 48-item test length) and number of controlled content areas. The five levels of the content area condition included zero (no content control), two, four, six and eight content area. All manipulated conditions within CAT and ca-MST were fully crossed with one another. This resulted in 2x5=10 CAT (test length x content area), and 2x5x2=20 ca-MST conditions (test length x content area x ca-MST panel design), for 30 total conditions. 4000 examinees were generated from N(0,1). All other conditions such as IRT model, exposure rate were fixed across the CAT and ca-MSTs. Results were evaluated with mean bias, root mean square error, correlation between true and estimated thetas and conditional standard error of measurement. Results illustrated that test length and the type of test administration model impacted the outcomes more than the number of content area. The main finding was that regardless of any study condition, CAT outperformed the two ca-MSTs, and the two ca-MSTs were comparable. The results were discussed in connection to the control over test design, test content, cost effectiveness and item pool usage. Recommendations for practitioner were provided. Limitations for further research were listed. |