Testing the Validity and Reliability of the “Learn, Pick, Flip, Check, Reward” (LPFCR) Card Game in Homophone Comprehension

The “Learn, Pick, Flip, Check, Reward” (LPFCR) Card Game was developed to enhance Primary 4 Pupils' comprehension of homophones. This paper reports the pilot test of the instruments used in the implementation of the game. It aims to examine the validity and reliability of the instruments. Four data collection instruments which were the pre and post tests, observation checklist and questionnaire were analysed statistically and thematically through the method of data analysis. 20 Primary 4 pupils of similar characteristics with the actual samples were involved in this pilot study. Findings of the pilot tests revealed that both pre and post tests achieved a high internal consistency and reliability of over, α<0.8. Meanwhile, the observation checklist was analysed thematically and was found to be valid and reliable as similarity in the given themes of observations were obtained. The questionnaire was analysed through SPSS Version 25 and a reliability coefficient of over, α<0.8 was obtained, which also indicated high internal consistency and reliability. These results proved that the instruments are valid and reliable to be used in evaluating the LPFCR card game.


Introduction
Testing the validity and reliability of a data collection instrument is fundamental in ensuring feasibility and consistency of the instrument in measuring an intended outcome of a research (Hazzi & Maldaon, 2015;Kinchin, Ismail, & Edwards, 2018). As the main study intends to measure the pupils' learning comprehension on homophones through the use of LPFCR Card Game, the instruments must first be developed and tested. The development and testing of the instruments are crucial for the main study as most available instruments do not specifically target on measuring learning comprehension on homophones. Although several past studies conducted by Dautrichea, Fiblad, Fievetb, and Christopheb (2018), Kenan and Hakkı (2018) and Treiman, Seidenberg, and Kessler (2015) focused on the learning of homophones, the instruments used in those studies mainly gather participants' behavioural interaction and motivational level. Hence, the major purpose of this pilot study is to determine the validity and reliability of the instruments prior to implementation in the main research. Four instruments were developed to measure the validity and reliability of the LPFCR Card Game. They were a pre and post test, observation checklist and questionnaire. This paper will first discuss literature on development of instruments and testing validity and reliability of instruments. This will be followed by a discussion on the methodological concerns in conducting the pilot study. The findings and discussion of the pilot test are then presented to determine the validity and reliability of the instruments.

Literature Review Development and Selection of Research Instruments
In order to select and develop research instruments, the researchers must first identify the contextual and psychometric aspects of the intended study that will be conducted (Mimi, Nor, Lai, & Kahirol, 2015). The contextual aspect of a research refers to the setting of the study which includes the objective and purpose of the study, characteristics of the participants and limitations of the research (Omair, 2015;Watson & de Wit, 2018). Meanwhile, the psychometric aspect of a study touches on the construction and the properties of the instrument in functioning and gauging the objectives of the research and also the feasibility of administration (Lin & Tsai, 2017;Zamanzadeh et al., 2015). Both contextual and psychometric aspects of a research have prominent impact on the type of instruments to be implemented in a study. Thus, researchers must develop and select proper instruments to go in line with the study.
Another important aspect to be taken into consideration is the research design of the study. Chu and Ke (2017) and Klenke, Martin and Wallace (2016) agreed that qualitative and quantitative research designs employ different data collection instruments in accordance to the aims of the research. Instruments that are thematically analyzed such as interviews, field notes and observation are mostly applied in qualitative-based research while statistically analyzed instruments such as test and assessments, surveys and questionnaires with Likert scales are employed in quantitative-based research (Charman, 2017;Creswell & Guetterman, 2019;Dick, 2015). On another perspective, associational and interventional researches such as correlational and action research, often implement both qualitative and quantitative instruments (Dick, 2015;Klenke et al., 2016). Depending on the types of research design, researchers should select and develop appropriate instruments to cater to the needs of the study. As the research discussed in this paper employs data analysis methodology, the instruments involved were a pre and post test, observation checklist and questionnaire, in accordance to the characteristics of the participants, limitations of the research and the aim of determining the pupils' comprehension in homophones.

Validity and Reliability of Research Instruments
Validity is defined as the accuracy of an instrument in measuring the anticipated construct within a research (Klenke et al., 2016;Noble & Smith, 2015). In other words, when an instrument measures what it is supposed to measure, then the instrument is valid. Validity includes several forms, namely face validity, content validity, and construct validity and criterion validity. For the purpose of this pilot study, face validity and content validity were emphasized in relation to the four data collection instruments. Face validity on one hand, measures the degree of an instrument at a surface level in the aspects of appropriateness and suitability in relation to the purpose of a study (Heale & Twycross, 2015). This validity determines whether an instrument has a "face value" that engages the needs of a research. Content validity on the other hand, focuses on the capacity of data collection items to gather, reflect and portray the variables that are measured (Appelman & Sundar, 2016;Mimi et al., 2015). This type of validity addresses the magnitude of each items in an instrument in gauging the context and construct of a research. On the whole, a valid data collection instrument is appropriate in collecting and measuring the outcome based on the context and psychometric aspects of a research.
Meanwhile, reliability is defined as the stability and consistency of scores from an instrument (Braun, Clarke, Hayfield, & Terry, 2019;Sharifah, Jamal, & Hamidah, 2017). This means that instruments that are reliable can be used several times in different timeline and produce explicit and consistent result. There are several measures of testing reliability which are, test-retest, equivalent form, internal consistency and reliability statistics. In this pilot study, the internal consistency and reliability statistics were used. On the aspect of internal consistency, an instrument is reliable when the items are consistent in measuring the same construct and produces similar scores or results (Heale & Twycross, 2015;Taber, 2018). An instrument that achieves internal consistency yield similar scores on the same aspect of skills or traits that are being evaluated. Meanwhile reliability statistics is related to Cronbach's Alpha, which is a measure of consistency within a scale in an instrument (Noble & Smith, 2015;Vaske, Beaman, & Sponarski, 2017). For most research purposes, a minimum reliability of 0.8 is necessitated (Taber, 2018). This minimum reliability is to ensure that the set questions or items in an instrument is consistent and has high relatedness in gauging the targeted aims of a study. Hence, the pilot study discussed in this paper involved testing the validity and reliability of the instruments that will then be used in gauging and measuring pupils' comprehension of homophones.

Methodology Development and Purpose of Research Instruments
The development of a research instrument is a necessary step to determine the content of the instrument and the scope of analysis based on the respondents for the research (Hult & Johnson, 2015). Hence, this section of the pilot study discusses the development of the instruments that will be administered for the main research.
The first and second instruments are the pre-test. It consists of three sections, which are Sections A, B and C, with a total of 20 questions. Section A comprises of eight "fill-in the blanks" questions, Section B contains six "circle the correct answer" questions and Section C covers six "underline the correct answer" questions. Meanwhile, the post-test contains four sections -Sections A, B, C and D, with a total of 24 questions. Section A consists of nine "underline the correct answer" questions, Section B comprises of five "true or false" questions, Section C covers six "multiple choice" questions and Section D contains four "build sentences" questions. Both of the pre and post tests include ten homophone pairs, selected from the High Frequency Words in the Standard Curriculum Document (DSKP) Primary 4 (Ministry of Education, 2015).
Another aspect of the test involves the thinking taxanomy. One of the modules by Hoffman, Kennedy, LoPilato, Monahan and Lance (2015) indicate that, an assessment must cover several levels of thinking taxonomy or classification to challenge examinee based on the intended topics of learning. Since the participants were Primary 4 pupils, both the pre and post tests were set at the lower level of remembering, understanding and applying. For example, in the pre-test, Section A, Section B and Section C were set based on the remembering and understanding level where questions such as "fillin the blanks", "circle the correct answer" and "underline the correct answer" were involved (Hambleton & Li, 2014;Hoffman et al., 2015). Conversely, the level of thinking taxonomy for the posttest was also similar to that of pre-test with addition of applying level for Section D where pupils are required to "build sentences" based on the learnt homophones.
After the administration of the pre and post tests, the scores were then weighted to 100%. After identifying the pupils' scores, the scores are analysed and placed with reference to the Criterion Referenced Assessment (CRA) based on Malaysian Primary School Achievement Test (UPSR) as it is a demand in determining the grade of a learner against a set of qualities or criteria, without reference to other individuals' achievement (Hambleton & Li, 2014;Lok, McNaught, & Young, 2016). CRA allows an examiner to grade, categorize and rank students based on a given set of achievement scores. With the categorical level indicated in the CRA, the researcher will be able to determine and verify the participants' level of homophones comprehension through the scores obtained from the pre and post tests in the main research. Table 1 shows the percentage (%) of scores to indicate the pupils' level of homophone comprehension. The third and fourth instruments are the observation checklist and questionnaire adapted from the Model of Academic Success by York, Gibson III and Rankin (2015). The model outlined that learning is effective when learners are able to achieve comprehension and success in five main elements: (1) attainment of learning outcome, (2) satisfaction in learning, (3) persistence in learning, (4) acquired learning skills and (5) performance or academic achievement. Hence, the observation checklist consisted of 15 observation statements which revolve around the five elements of academic success with three statements for each element. The questionnaire too consists of 15 close-ended items, gauging on the comprehension aspects in the learning of homophones. Each of the items are rated on a Likert scale of 1 to 5, to indicate the degree of which the participants agree or disagree to any of the given statements based on their comprehension level as they learned through the application of LPFCR Card Game.

Process of Piloting the Research Instruments
After the development of the instruments selected for the main research, each of the instrument was pilot tested. This is an important step to determine the validity and reliability of the instruments (Noble & Smith, 2015). As mentioned by Mikuska (2017), by implementing the instruments to at least 12 to 50 people before the actual research will allow researchers to identify the strengths and weakness of the instrument, ready for any necessary modifications. Hence, 20 pupils from the Primary 4 were selected for the pilot test. These pupils have similar proficiency as the main participants with "Average Language Proficiency" to "Low Language Proficiency". These level of proficiency was also determined by their English language performance recorded in the School Based Assessment throughout year 2018 and 2019.
The pilot testing of the instruments took place throughout the month of May 2019 in the following order.

Findings and Discussion Validity and Reliability of the Pre and Post Tests
For face validity, the pre and post tests were analyzed by an expert teacher to determine the feasibility and practicality of the questions for Primary 4 level. The expert teacher is the head of English Language Examination Resource in Daro, Sarawak. The expert teacher reported that the pre and post tests were clear and not confusing. This designates the face value of the pre and posttests; indicating face validity. Another criteria of validity is content validity. The expert teacher examined that the pre and posttests consist of all the 10 homophones and the levels of thinking taxonomy (remembering, understanding and applying) that were assessed. Based on the expert teacher's evaluation of the face value and content of the pre and posttests, the tests were found to be valid.
Meanwhile, to determine the internal consistency reliability, the pre and posttests, were marked by the researcher and expert teacher. The overall result of the pre and posttests that were administered were as follows. From the overall score, all of the pupils were able to achieve excellent result with percentage ranging from 80% to 91% for both papers. In order to determine the reliability of the pre and posttests, the Cronbach's Alpha reliability coefficient was employed based on the given formula.

Figure 1. Cronbach's Alpha Reliability Coefficient Formula
The pre-test produced a Cronbach's Alpha of 0.854, while the post-test had a Cronbach's Alpha of 0.843. These results proved that both tests had achieved high internal consistency and are reliable. Apart from the results of the Cronbach's Alpha, assessments that produces similar results are considered to have good internal consistency, which is an indication of a reliable and valid assessment (Hoffman et al., 2015). This statement is also supported by Hambleton and Li (2014) and Lok et al. (2016) where assessment with good internal consistency correlates with the intended outcome of a lesson and produces almost equal scores among the examinees. In this case, the pilot participants were able to answer the pre and post tests with a good level of homophone comprehension after the implementation of the LPFCR Card Game. Hence, both pre and post tests showed good internal consistency which contributed to the high validity and reliability of the instruments.

Validity and Reliability of the Observation Checklist and Questionnaire
The observation checklist and questionnaire were developed with the intent to observe and measure the pupils' comprehension on the learning of homophones. According to York, Gibson and Rankin Revised Conceptual Model of Academic Success, a learner is successful in the learning process when they are able to attain five important elements of "… academic or learning achievement, satisfaction in learning, persistence in completing a given task, acquisition of skills and attainment of learning objectives" (York et al., 2015). Hence, based on these five elements, the questions and statements of the observation checklist and questionnaire were constructed to gauge on the pupils' comprehension throughout the implementation of LPFCR Card Game. Throughout the implementation of the pilot test on the selected participants, the researcher, accompanied by the expert teacher conducted the observations. For this purpose, the observation checklist was employed to collect relevant data on the pilot participants in relation to specific patterns of behaviours. After data was collected, both the checklists by the researcher and the expert teacher were triangulated to identify similarities and differences in the statements and noted details. Table 4 shows the data triangulated from several extracted items from the observation checklists.

Attainment of Learning Outcome
Pupils are able to describe the different homophone pairs when asked verbally LPFCR Card Game.

Researcher
Able to give clear description on the learnt homophones in sentences.

Expert Teacher
Can put homophones in a sentence to describe.

Satisfaction In Learning
Pupils show excitement when receiving reward tokens from the "R -Reward" section after completing the card game.

Researcher
Pupils smiled and laughed receiving tokens.

Expert Teacher
Giggled, smiled, enjoy getting tokens.
From the triangulated data collected, there were several similar key points and remarks that were noted. Within all five elements that were highlighted in the observation checklists, the remarks from the researchers were consistent with the remarks by the expert teacher. For example, in the first element which is the "Attainment of Learning Outcome", both the researcher and expert teacher remarked similar keywords which are "Able to give clear description on the learnt homophones in sentences" and "Can put homophones in a sentence to describe". The consistency in the remarks are indication of a solid and reliable instrument that can be implemented in line with the purpose of the study (Mimi et al., 2015). Nardi (2018) also supported that the similarity in observations and comments in a qualitative measure is a good indication of valid and dependable instrument. Therefore, with the indications of similarity in the observations that were conducted, the observation checklist was found to be valid and reliable. Moving on to the aspect of testing the reliability of the questionnaire, after employing the LPFCR Card Game and post-test, the pilot participants were given a questionnaire to collect their opinions on the degree of which they agree or disagree to any of the given statements. The statements were rated on a Likert scale of 1 to 5 where "1 -Strongly Disagree" to "5 -Strongly Agree". The data from the questionnaire were collected and tabulated into SPSS Version 25 to obtain the Cronbach's Alpha reliability coefficient. As asserted by Taber (2018), Cronbach's Alpha is a measure of scale reliability of which to determine the consistency in relatedness of a set of items within a group. Table 5 shows the reliability statistics of the data collected from the questionnaire.  Vaske et al. (2017) stated that Cronbach's Alpha that is more than or equal to 0.8 (≥ 0.8) is an excellent indication of a good internal consistency and a reliable set of items. As shown in the table, the sets of items of the questionnaire have a good internal consistency with a Cronbach's Alpha of 0.873. This indicated that the questionnaire is a reliable source of instrument to gather the participants' comprehension on the learning of homophone through the application of LPFCR Card Game.

Conclusion
The main purpose of this pilot study was to determine the validity and reliability of the instruments that will be implemented for the main research. From this pilot study, the pre-test, post-test, observation checklist and questionnaire were proven valid and reliable to be employed for data collection in the main study of LPFCR Card Game in enhancing the Year 4 pupils' comprehension of homophones. With the established validity and reliability, these instruments will be applicable for future studies in gathering data and information in relation to learners' comprehension on the learning of homophones.