Validity and Inter-rater Reliability of the Scoring Rubrics for the Science Teacher TPACK Test Instrument
Abstract
This study examined the content validity of the scoring rubric instrument for measuring science teachers’ TPACK and the inter-rater reliability in using the instrument. This research was conducted as part of research and development which has been designed for the development of instruments for measuring teacher knowledge. The analysis carried out was a qualitative analysis based on triangulation of the three validators’ validation results and quantitative analysis for inter-rater reliability based on the Intraclass Correlation Coefficient (ICC) obtained for each question. The validation involved three science education experts from the university to assess the suitability of the scoring rubrics in the technological pedagogical content knowledge (TPACK) framework. Inter-rater reliability examining involved 100 participants who answered 15 questions on the instrument and three experienced raters to assess the participants' answers. The validation results showed that the instrument content was valid for measuring the knowledge tested and had very high inter-rater reliability coefficient for all items The validation results show that qualitatively the contents of the instrument are valid for measuring the knowledge being tested and had an average inter-rater reliability coefficient of 0.94 (very high).
Keywords
Full Text:
PDFReferences
Adi Putra, MJ, Widodo, A and Sopandi, W 2017, ‘Science Teachers’ Pedagogical Content Knowledge and Integrated Approach’, Journal of Physics: Conference Series, vol.895, no.1.
Agustin, RR and Liliasari, L 2017, ‘Investigating Pre-Service Science Teachers (PSTs)’ Technological Pedagogical Content Knowledge Through Extended Content Representation (CoRe) Investigating Pre- Service Science Teachers (PSTs)’ Technological Pedagogical Content Knowledge Through Ext’, Journal of Physics: Conference Series PAPER, 812
Anderson, LW and Krathwohl, DR 2001, A Taxonomy for Learning, Teaching, and Assessing A Revision of Bloom’s Taxonomy of Educational Objectives. New York: Addison Wesley Longman. Inc.
Angeli, C, Valanides, N and Christodoulou, A 2016, ‘Theoretical Considerations of Technological Pedagogical Content Knowledge’, in Herring, MC, Koehler, M J, and Mishra, P (eds) Handbook of Technological Pedagogical Content Knowledge (TPACK) for Educators. 2nd edn. New York and Oxon: Routledge, pp. 11–32.
Attali, Y and Burstein, J 2006, ‘Automated Essay Scoring With e-rater V.2’, The Journal of Technology, Learning, and Assessment, vol.4, no.3.
Bertram, A and Loughran, J 2012, ‘Science Teachers’ Views on CoRes and PaP-eRs as a Framework for Articulating and Developing Pedagogical Content Knowledge’, Research in Science Education, vol.42, no.6, pp. 1027–47.
Brophy, TS 2013, ‘Writing Effective Rubrics’. University of Florida, pp. 1–9.
Brown, GTL 2009, ‘The reliability of essay scores: The necessity of rubrics and moderation’, in Meyer, LH et al. (eds) Tertiary assessment and higher education student outcomes: Policy, practice and research. Wellington, N.Z.: Ako Aotearoa, pp. 43–50.
Burstein, J and Attali, Y 2005, ‘Automated Essay Scoring With E-rater V. 2.0’, Journal of Technology, Learning, and Assessment, vol.4, no.3.
Bilici, SC et al. 2013, ‘Technological Pedagogical Content Knowledge Self-Efficacy Scale (TPACK-SeS) for Pre-Service Science Teachers: Construction, Validation, and Reliability Suggested Citation’, Eurasian Journal of Educational Research, vol.52, pp. 37–60.
Bilici, SC, Guzey, SS and Yamak, H 2016, ‘Assessing pre-service science teachers’ technological pedagogical content knowledge (TPACK) through observations and lesson plans’, Research in Science and Technological Education, vol.34. no.2, pp. 237–51.
Cantabrana, JLL, Rodríguez, MU and Cervera, MG 2019, ‘Assessing teacher digital competence: The construction of an instrument for measuring the knowledge of pre-service teachers’, Journal of New Approaches in Educational Research, vol.8, no.1, pp. 73–8.
Ghazali, NHM 2016, ‘A Reliability and Validity of an Instrument to Evaluate the School-Based Assessment System: A Pilot Study’, International Journal of Evaluation and Research in Education (IJERE), vol. 5, no.2), pp. 148–57.
Cox, S and Graham, C 2009, ‘Using an elaborated model of the TPACK framework to amalyze and depict teacher knowledge’, TechTrends, vol.53. no.5, pp. 60–9.
Dawson, P 2017, ‘Assessment rubrics: towards clearer and more replicable design, research and practice’, Assessment and Evaluation in Higher Education, vol.42, no.3, pp. 347–60.
Finch, WH and French, BF 2018, Educational and Psychological Measurement, Educational and Psychological Measurement.
Fleenor, JW, Fleenor, JB and Grossnickle, WF 1996, ‘Interrater reliability and agreement of performance ratings: A methodological comparison’, Journal of Business and Psychology, vol.10, no.3, pp. 367–80.
Gisev, N, Bell, JS. and Chen, TF 2013, ‘Interrater agreement and interrater reliability: Key concepts, approaches, and applications’, Research in Social and Administrative Pharmacy. Elsevier Inc, vol.9, no.3, pp. 330–38.
Hair, JH et al. 2010, Multivariate Data Analysis. 7th edn. New York: Pearson.
Harris, J, Grandgenett, N and Hofer, M 2010, ‘Testing a TPACK-based technology integration assessment rubric’, Teacher Education and Professional Development Commons, pp. 3833–3840.
Haynes, SN, Richard, DCS and Kubany, E. S 1995, ‘Content Validity in Psychological Assessment: A Functional Approach to Concepts and Methods’, Psychological Assessment, vol.7, no.3, pp. 238–47.
Jang, SJ and Chen, KC 2010, ‘From PCK to TPACK: Developing a Transformative Model for Pre-Service Science Teachers’, Journal of Science Education and Technology, vol.19, no.6, pp. 553–64.
Jonsson, A and Svingby, G 2007, ‘The use of scoring rubrics: Reliability, validity and educational consequences’, Educational Research Review, vol. 2, no.2, pp. 130–44.
Jüttner, M et al. 2013, ‘Development and use of a test instrument to measure biology teachers’ content knowledge (CK) and pedagogical content knowledge (PCK)’, Educational Assessment, Evaluation and Accountability, vol.25, no.1, pp. 45–67.
Kastner, M and Stangla, B 2011, ‘Multiple choice and constructed response tests: Do test format and scoring matter?’, Procedia - Social and Behavioral Sciences, vol.12, pp. 263–73.
Keith, TZ 2003, ‘Validity and Automated Essay Scoring Systems’, in Shermis, M D and Burstein, J (eds) Automated Essay Scoring: A Cross-Disciplinary Perspective. New Jersey: Lawrence Erlbaum Associates, pp. 147–168.
Koehler, MJ & Mishra, P 2006, ‘Technological Pedagogical Content Knowledge: A Framework for Teacher Knowledge’, Teachers College Record, vol.108, bo.6, pp. 1017–54.
Koehler, MJ and Mishra, P 2009, ‘What Is Technological Pedagogical Content Knowledge?’, Contemporary Issues in Technology and Teacher Education (CITE), vol.9, no.1, pp. 60–70.
Koh, JHL, Chai, CS and Tsai, CC 2013, ‘Examining practicing teachers’ perceptions of technological pedagogical content knowledge pathways: A structural equation modeling approach’, Instructional Science, vol.41, no.4, pp. 793–809.
Koo, TK and Li, MY 2016, ‘A Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research’, Journal of Chiropractic Medicine. vol.15, no.2, pp. 155–63.
Li, H and He, L 2015, ‘A comparison of EFL raters’ essay-rating processes across two types of rating scales’, Language Assessment Quarterly, vol.12, no.2, pp. 178–212.
Ministry of Education and Culture 2015, Pedoman Pelaksanaan Uji Kompetensi Guru.
Moskal, BM 2000, ‘Scoring rubrics: What, when and how?’, Practical Assessment, Research and Evaluation, vol.7, no.3, pp. 2000-1.
Moskal, BM and Leydens, JA 2002, ‘Scoring Rubric Development: Validity and Reliability’, in Boston, C. (ed.) Understanding Scoring Rubrics A Guide for Teachers. Maryland: Printing Images. Inc, pp. 25–33.
Nehm, RH and Haertig, H 2012, ‘Human vs. Computer Diagnosis of Students’ Natural Selection Knowledge: Testing the Efficacy of Text Analytic Software’, Journal of Science Education and Technology, vol.21, no.1, pp. 56–73.
Ngang, T K, Nair, S and Prachak, B 2014, ‘Developing Instruments to Measure Thinking Skills and Problem Solving Skills among Malaysian Primary School Pupils’, Procedia - Social and Behavioral Sciences. vol. 116, pp. 3760–64.
Nitko, AJ, & Brookhart, SM 2014, Educational Assessment of Students Sixth Edition. 6th edn, Pearson New International Edition. 6th edn. Essex: Pearson.
Nkhoma, C et al. 2020, ‘The Role of Rubrics in Learning and Implementation of Authentic Assessment: A Literature Review’, in Proceedings of the 2020 InSITE Conference, pp. 237–276.
Nunnally, JC and Bernstein, IH 1994, Psychometric Theory. 3rd edn. United States: McGraw-Hill.
Okhremtchouk, IS, Newell, PA and Rosa, R 2013, ‘Assessing pre-service teachers prior to certification: Perspectives on the performance assessment for california teachers (PACT)’, Education Policy Analysis Archives, 21(July).
Ornstein, AC 1992, ‘Essay Tests: Use, Development, and Grading’, The Clearing House: A Journal of Educational Strategies, Issues and Ideas, vol.65, no.3, pp. 175–7.
Pallant, J 2011, SPSS Survival Manual: A step by step guide to data analysis using SPSS 4th edition. 4th edn. Crows Nest NSW: Allen & Unwin.
Pamuk, S et al. 2015, ‘Exploring relationships among TPACK components and development of the TPACK instrument’, Education and Information Technologies, vol.20, no.2, pp. 241–63.
Rademakers, J, Cate, TJT and Bär, PR 2005, ‘Progress testing with short answer questions’, Medical Teacher, vol.27, no.7, pp. 578–82.
Rios, JA and Wang, T 2018, ‘Essay Items’, in Frey, B. B. (ed.) The SAGE Encyclopedia of Educational, Measurement, and Evaluation. Thousand Oak CA: SAGE Publications, Inc, pp. 602–605.
Santos, V, Verspoor, M and Nerbonne, J 2012, ‘Identifying important factors in essay grading using machine learning’, Language Testing and Evaluation Series (International Experiences in Language Testing and Assessment), 28(January), pp. 295–309.
Schmid, M, Brianza, E and Petko, D 2020, ‘Developing a short assessment instrument for Technological Pedagogical Content Knowledge (TPACK.xs) and comparing the factor structure of an integrative and a transformative model’, Computers and Education, 157.
Shermis, MD et al. 2010, ‘Automated essay scoring: Writing assessment and instruction’, International Encyclopedia of Education, pp. 20–26.
Shulman, LS 2015, ‘PCK It Genesis and Exodus’, in Berry, A, Friedrichsen, P, and Loughran, J (eds) Re-examining Pedagogical Content Knowledge in Science Education. 1st edn. New York and London: Routledge, pp. 3–13.
Smolentzov, A 2012, Automated Essay Scoring: Scoring Essays in Swedish. Stockholm.
Stacey, M et al. 2020, ‘The development of an Australian teacher performance assessment: lessons from the international literature’, Asia-Pacific Journal of Teacher Education. vol. 48, no.5, pp. 508–19.
Sumaryanta et al. 2018, ‘Assessing Teacher Competence and Its Follow-up to Support Professional Development Sustainability’, Journal of Teacher Education for Sustainability, vol.20, no.1, pp. 106–23.
Thorndike, RM and Thorndike-Christ, T 2014, Measurement and Evaluation in Psychology and Education. 8th edn, Journal of the American Statistical Association. 8th edn. London: Pearson Education.
Valenti, S, Neri, F and Cucchiarelli, A 2003, ‘An Overview of Current Research on Automated Essay Grading’, Journal of Information Technology Education: Research, vol. 2, pp. 319–30.
Wallerstedt, S, Erickson, G and Wallerstedt, SM 2012, ‘Short Answer Questions or Modified Essay questions – More Than a Technical Issue’, International Journal of Clinical Medicine, vol.3, no.1, pp. 28–30.
Weinberger, A and Guetl, C 2011, ‘Analytical Assessment Rubrics to facilitate Semi-Automated Essay Grading and Feedback Provision Mohammad AL-Smadi’, in Proceedings of the ATN Assessment Conference. Curtin University Perth, Western Australia, pp. 170–177.
Widiansah, KN, Kartono and Rusilowati, A 2019, ‘Development of Assessment Instruments Mathematic Creative Thinking Ability on Junior High School Students’, Journal of Research and Educational Research Evaluation, vol. 8, no. 1, pp. 84–90.
Williamson, D et al. 2010, ‘Automated Scoring for the Assessment of Common Core Standards’, Educational Testing Service., (July).
Wu, M, Tam, HP and Jen, T H 2016, Educational Measurement for Applied Researchers, Educational Measurement for Applied Researchers.
DOI: http://dx.doi.org/10.30870/jppi.v8i1.11164
Refbacks
- There are currently no refbacks.
Copyright (c) 2022 Jurnal Penelitian dan Pembelajaran IPA
This work is licensed under a Creative Commons Attribution 4.0 International License.
Jurnal Penelitian dan Pembelajaran IPA is licensed under a Creative Commons Attribution 4.0 International License
Copyright © 2024 Jurnal Penelitian dan Pembelajaran IPA. All rights reserved.