Assessing the Quality of Multiple-Choice Test Items Based on BILOG-MG, R (ltm), IATA, and GSP-ROC
PDF

Keywords

Multiple-choice items
BILOG-MG
R (ltm)
IATA
GSP-ROC

How to Cite

Assessing the Quality of Multiple-Choice Test Items Based on BILOG-MG, R (ltm), IATA, and GSP-ROC. (2024). WVSU Research Journal, 13(1), 1-12. https://doi.org/10.59460/wvsurjvol13iss1pp1-12

Abstract

The purpose of this study is to analyze, assess, and select 50 multiple-choice items of English 1 course for the final test of 876 students at the university based on item analysis software: BILOG-MG, R (ltm package), IATA, and the combination of GSP chart and ROC method (GSPROC). The research results show that multiple-choice items are satisfactory and are eligible to be used in the test and the unsatisfactory multiple-choice items need to be reviewed for adjustment and improvement. The combined use of multiple software to analyze, assess, and select multiple-choice items is necessary to improve the quality of multiple-choice items. This not only contributes to improving the quality of testing and assessing learners, but also contributes to improving the quality of teaching and learning in the current period at universities.

PDF

References

Baker, F. B. (2001). The basics of item response theory. Education Resources Information Center (ERIC).

Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. Statistical Theories of Mental Test Scores.

Bui, A. K., & Bui, N. P. (2018). Using IATA to analyze, assess and improve the quality of the multiple-choice questions in chapter power functions, exponential functions and logarithmic functions. Can Tho University Journal of Science, 54(9), 81–93.

Bui, N. Q. (2017). Assessment of the quality of multiple choice test bank for the module of Introduction to Anthropology by using the RASCH model and QUEST software. Science of Technology Development, 20(X3), 42–54.

Cartwright, F. (2007). IATA 3.0 Item and Test Analysis: A software tutorial and theoretical introduction.

Doan, H. C., Le, A. V., & Pham, H. U. (2016). Applying 3-parameter logistic model in validating the level of difficulty, discrimination and guessing of items in a multiple choice test. Ho Chi Minh city University of Education Journal of Science, 7 (85), 174-184.

Du Toit, M. (2003). IRT from SSI: Bilog-MG, Multilog, Parscale, Testfact. Scientific Software International.

Foster, R. C. (2021). KR20 and KR21 for some nondichotomous data (it’s not just Cronbach’s alpha). Educational and Psychological Measurement, 81(6), 1172-1202. https://doi.org/10.1177/001316442199253

Kumar, R., & Indrayan, A. (2011). Receiver operating characteristic (ROC) curve for medical researchers. Indian Pediatrics, 48, 277–287. https://doi.org/10.1007/s13312-011-0055-4

Lam, Q. T. (2011). Measurement in Education - Theory and Application. Hanoi: Vietnam National University Publishing House.

Nguyen, P. H. (2016). Using GSP chart and ROC method to analyze mutiple-choice test items and assess learning outcomes of students. Journal of Education Science, Vietnam Institute of Education Science, 134(11), 32–37.

Nguyen, P. H. (2017). Using GSP chart and ROC method to analyze and select mutiple-choice test items. Dong Thap University Journal of Science, 24(2), 11–17. https://doi.org/10.52714/dthu.24.2.2017.426

Nguyen P. H., & Du, T. N. (2014). Assessing the rating results and predicting students’ learning outcomes based on grey relational analysis and grey model. Can Tho University Journal of Science, 32, 43–50.

Nguyen, P. H., & Du, T. N. (2015). The analysis and selection of mutiple-choice test items based on S-P chart, Grey Relational Analysis, and ROC curve. Ho Chi Minh city University of Education Journal of Science, 6 (72), 163.

Nguyen, P. H., & Trinh, T. K. B. (2017). Assessment of students’ learning outcomes a combination of GSP chart and ROC method. AGU International Journal of Sciences, 17(5), 103–112.

Nguyen, P. H., & Trinh, T. K. B. (2022). Assessment of Students’ Learning Outcomes in Higher Education. International Journal of Uncertainty and Innovation Research, 4(1), 39–52. https://doi.org/10.34238/tnu-jst.5554

Nguyen, V. C., & Nguyen, P. H. (2020). Analyzing and Selecting Multiple-Choice Test Items Based on Classical Test Theory and Item Response Theory. Ho Chi Minh city University of Education Journal of Science, 17(10), 1804-1818.

Pham, T. M., & Bui, Đ. N. (2019). The IATA software for analyzing, evaluation of multiple-choice questions at Ha Noi Metropolitan University. Scientific Journal of Ha Noi Metropolitan University, 20, 97–108.

Rasch, G. (1993). Probabilistic models for some intelligence and attainment tests. Education Resources Information Center (ERIC).

Rizopoulos, D. (2006). ltm: An R package for latent variable modelling and item response analysis. Journal of Statistical Software, 17(5), 1–25. https://doi.org/10.18637/jss.v017.i05

Tavakol, M., & Dennick, R. (2012). Standard setting: The application of the receiver operating characteristic method. International Journal of Medical Education, 3, 198–200). https://doi.org/10.5116/ijme.506f.1aaa

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Copyright (c) 2024 WVSU Research Journal