A Content Validity Study for Vocational Teachers’ Assessment Literacy Instrument (VoTAL)

The importance of content validity has made it a crucial step in the development of a measurement instrument. This study aimed to establish content validity for Vocational Teachers’ Assessment Literacy (VoTAL) instrument using Lawshe’s Content Validity Ratio (CVR) analysis. In this study, fourteen professional experts and seven lay experts were selected to assess the VoTAL instrument. A purposive sampling technique was employed to choose the expert panels based on their expertise that appropriate to the domain constructs of the instrument. The VoTAL instrument consists of three primary constructs represented by 125 items. Overall, the CVR analysis suggested that 100 items met the minimum requirements of the overall CVR value (CVR ≥ 0.428) and mean of judgments (?̅? ≥ 1.5), whereby 25 items were excluded. In conclusion, the VoTAL instrument shows good content validity for measuring vocational teachers’ assessment literacy. Hence, further study should be carried out to examine additional psychometric properties of this instrument.


Introduction
Vocational education in Malaysia has developed in numerous ways over the last decade. In 2012, all vocational schools nationwide were upgraded to Vocational College to meet the 11th Malaysia Plan agenda, which is expected to create about 1.5 million job opportunities, with 60 percent of these jobs coming from TVET-related sectors (Economic Planning Unit, 2015). The development in vocational education not only brought changes in the teaching and learning system but also improvements in the assessment of vocational classrooms. Vocational College employs two types of assessments. The first type of assessment is centralized and contributes to 30 percent of overall assessments. The Malaysian Examination Syndicate administers this centralized assessment. The second type of assessment is competency-based and contributes to 70 percent of the overall assessments. Teachers in classrooms or vocational workshops perform this assessment. Therefore, due to the importance of teachers' role in the assessment of student's achievement, the demand for accountability and quality of assessment has become an issue in vocational education (Mohamed et al., 2016;Rahman et al., 2014;Retnawati et al., 2016). Classroom assessment has been described as the cornerstone of the current education system (Brookhart, 2011;Kelly et al., 2020;Popham, 2014). Previous researches had emphasized the significance of assessment in strengthening the education system (Birenbaum et al., 2015;Brookhart, 2011;DeLuca, 2012;Popham, 2014). Additionally, constant inclusion of student assessments in the teaching and learning process has demonstrated improvements in students' achievement, motivation, positive self-esteem and improve teachers' instruction (Bennett, 2011;DeLuca & Klinger, 2010;Harlen, 2012;Maclellan, 2004;Willis et al., 2013).
In view of these growing benefits, the attention on supporting teachers' assessment literacy has increased (Deluca et al., 2016a). Thus, the Vocational Teachers' Assessment Literacy (VoTAL) instrument had been developed to assess vocational teachers' self-perceived assessment literacy. In order to enhance the instrument's validity, the newly developed VoTAL instrument is subjected to a content validity test. The content validity of the VoTAL instrument was analyzed by professional and lay expert panels. The data collected from the expert panels were analyzed by using Lawshe's Content Validity Ratio (CVR) method.

Literature Review Assessment literacy
Teachers' assessment literacy is teachers' capability to express classroom and social culture perspectives to implement, create, perform, incorporate, and apply effective assessment strategies to supports students' learning against education standards (Alonzo, 2020;Willis et al., 2013). Given the persistent demands to improved and enhanced teachers' assessment literacy (Popham, 2016;Shepard, 2020), previous studies had consistently shown that teachers experience significant challenges in implementing contemporary assessment methods that reflect the current assessment landscape requirements (DeLuca et al., 2018). Consequently, teachers maintain a differing perspective of educational assessment that leads to differences in classroom assessment practices. Studies also showed little credible information on teachers' assessment literacy concerning the current accountability expectations caused by inadequate contemporary assessment literacy measurement scales that predicted the present educational assessment standards (DeLuca et al., 2016a). This finding is not unexpected since most of the assessment literacy instruments are predicted on the early 1990s standard, which is the Standards for Teacher Competence in Educational Assessment of Students (STCEAS) (Deluca et al., 2016a;Gotch & French, 2014 Brookhart (2011) states that the STCEAS 1990 assessment standard is outdated in two aspects; (1) it does not take into consideration the current concept of formative assessment; and (2) it also does not take into consideration the social issues, as well as the different aspects teachers, encounter in contemporary assessment landscape.
To reflect on these issues, recently published Classroom Assessment Standards by Joint Committee on Standards for Educational Evaluation -JCSEE 2015 (Klinger et al., 2015) has addressed this criticism by stating a set of guidelines and principles in the assessment of students' learning, which in line with the current context of educational assessment landscape. The evolution of classroom assessment requirements, together with the development of the JCSEE 2015 standard, indicates that previous measurement scales to assess teachers' assessment literacy may not provide strong validity concerning the present contemporary classroom assessment context (Deluca et al., 2016a;Gotch & French, 2014). Thus, in an attempt to expand research in assessment literacy beyond the STCEAS 1990 standards, this study had developed the Vocational Teachers' Assessment Literacy (VoTAL) instrument. The VoTAL instrument was developed based on the newly published JCSEE 2015 standard. This standard provides contemporary alternatives to the STCEAS 1990 standards and reflects accurately the assessment expectations of teachers within the accountability driven education systems (DeLuca et al. 2016b).

Content Validity
Content validity is an essential aspect that needs to be addressed in developing a new measurement instrument. DeVellis (2003) stressed that content validity is the first kind of validity that will be assessed in developing a new instrument. Content validity is the extent to how each item in an instrument is suitable and represents the definition of a domain construct (DeVellis, 2003;Furr, 2011;Rubio et al., 2003). Content validity is an assessment for each item to make them appropriate for the instrument development purpose. An instrument with strong content validity should include only relevant and essential items that appropriately address the instrument's construct (Nunnally & Bernstein, 1994). Content validity is critical to ensure the measurement instrument is measuring what it should be measured. Thus, this study highlighted one of the crucial phases in the development of a new instrument by demonstrated the method to determine the content validity of the VoTAL instrument. The Content Validity Ratio (CVR) method is employed to assess the content validity in this study.
CVR is a broadly employed method to measure the content validity of an instrument through empirical assessment. The CVR method, which was developed by Charles Lawshe in 1975, helps researchers to determine whether to keep or remove items on measurement instruments through CVR calculations. In short, it aims to filter items empirically with quantitative procedures to ensure that each item truly represents the content of the domain construct. CVR employs a group of expert panels to examine the degree to which each item reflects the domain construct of an instrument. The group of experts will assess each item on the three-point scale, (1) essential, (2) useful but not essential, and (3) not essential. The CVR method is selected in this study as it is practical in terms of time and cost, straight forward, simple, and easy to apply. Apart from that, CVR also provides a table for determining the critical cut-off value and emphasizing the statistical significance of agreement at the item level.

The VoTAL Instrument
The VoTAL instrument was developed with the aim to assess vocational teachers' selfperceived assessment literacy. This instrument was developed based on the JCSEE 2015 standard and consisted of 125 items to represent three primary constructs and 14 subconstructs. The assessment literacy constructs covered in this instrument are (1) Assessment Foundation, (2) Use of Assessment, and (3) Assessment Quality. Table 1 shows the items' distribution for the VoTAL instrument.

Identification of Validation Panels
Experts are individuals who possess the knowledge and expertise in a specific field (Nur Farhana et al., 2018). Selecting the right validation panels may influence the reliability of the validation process in determining if the measurement instrument is appropriately developed and suited to psychometric assessment (Grant & Davis, 1997). The validation panels in this study consist of professional and lay experts. The professionals in this study were selected from the practitioners with research or fieldwork experience. To allow the representation of the study population for the developed instrument, the lay experts were chosen from the potential subject of the study (Rubio et al., 2003;Zamanzadeh et al., 2015). The professional experts were identified based on their background in the research area with specific expertise, possessing the related working experience, and having up-to-date knowledge, while the lay experts were selected for their work in the appropriate field (Nur Farhana et al., 2018;Powell, 2003;Rubio et al., 2003). Determining the number of expert panels has always been diverse in the literature. As suggested by Zamanzadeh (2015), a minimum of five experts is required to achieve adequate control on the possible agreement. Meanwhile, Lynn (1986) recommended a minimum of three experts, while Rubio et al. (2003) suggest for each category of experts to have between three to ten panels. Other scholars such as Gable and Wolf (1993) and Tilden et al. (1990) had proposed two to twenty panels for each category of experts. The Lawshe method of content validation only requires a minimum of four experts. However, more information about the measure can be obtained by using a larger number of experts (Allahyari et al., 2009;Gable & Wolf, 1993;Rubio et al., 2003). Even so, the final decision on the number of panel experts is based on the experts' level of competence, knowledge, and depth of experience (Grant & Davis, 1997).

Expert Panels
In this study, a purposive sampling technique was employed to select 14 professional experts and seven lay experts. The experts were chosen according to their expertise appropriate to the domain constructs of the instrument. The expert panels were contacted in person to obtain their agreement to take part in the study. Before their approval, the expert panels will be informed about the study purpose, the reason for them being selected, and the experts' role in the study. Appointment letters and the validation materials were sent by email. The expert panels were provided with detailed instructions for validating the instrument. The expert panels were also requested to identify areas of deficiency and provide recommendations to improve the sentence structure and clarity of the items. All the expert panels were given two weeks to assess and validate 125 items in the VoTAL instrument. This study employs an online method for the assessment and validation process as the selected experts were from a different and large geographical location.

Lawshe's CVR Model Modification
CVR employs a group of expert panels to assess the appropriateness of an instrument's items reflect the domain construct on the three-point scale, (1) essential, (2) useful but not essential, and (3) not essential. However, there have been some criticisms of Lawshe's CVR model in assessing the agreement and response of the panels (Ahmad et al., 2019;Allahyari et al., 2009;Chalavi et al., 2015). Thus, to prevent different misunderstandings related to Lawshe's codes and to provide more significant differentiation in panels' ratings, Lawshe's three-point rating scales were expanded to a five-point scale (Ahmad et al., 2019;Allahyari et al., 2009;Chalavi et al., 2015). As to provide the panels with three different options as suggested by Leedy and Ormrod (2016) guideline, the five-point Likert scale for the judgments is composed of two positive, one neutral, and two negative scales. In comparison to Lawshe's scale, the proposed five-point Likert scale is better as it provides a wider selection range and clear sentences (Ahmad et al., 2019;Allahyari et al., 2009). Therefore, the expert panels for this study were instructed to provide their judgments regarding the suitability of each item to the domain construct based on the defined five-points scale, which includes, 1= totally not suitable; 2= not suitable; 3 = less suitable; 4 = suitable; 5 = very suitable.

Quantifying of Consensus Among Panellists
The consensus of judgments among expert panels on the necessity to include an item in the measure can be quantified by determining the content validity ratio (CVR). The judgments of expert panels who made the suitable (4) and very suitable (5) selection were computed using the content validity ratio formula: The nе from the above formula denotes the number of panels which made the suitable (4) and very suitable (5) selection, while N refers to the total panels. The outcome of this formula can be described that the CVR value is closer to value 1 when all panels agreed that the item is suitable (4) and very suitable (5). The CVR values ranged from 0 to 1 when over half of the panels made the suitable (4) and very suitable (5) selection. The CVR value is negative (CVR <0) when less than half of the panels made the suitable (4) and very suitable (5) selection. The CVR value acceptance criteria on items are based on the revised version of the reference table (Wilson et al., 2012), which was initially developed by Lawshe (1975). The revised version of the reference table is shown in Table 2. Table 2 Acceptance CVR Values Based on Lawshe (1975) and Reviewed by Wilson et al. (2012)  Based on Table 2, with the total number of 21 expert panels, the CVR critical value for each item must be equal to or greater than 0.428 at α = .05 level of significance for a two-tailed test. Any item with the CVR critical value less than 0.428 will be excluded from the instrument.

Calculation of The Respective Judgments' Means (̅)
To calculate the mean of judgments ( ̅ ) for every item, the values reflected in the instrument will be converted according to the following conversion rules (Allahyari et al., 2009;Chalavi et al., 2015): • Very Suitable or Suitable -was replaced with 2 • Less Suitable -was replaced with 1 • Totally Not Suitable or Not Suitable -was replaced with 0 The total value of Very Suitable (2), Suitable (1), and Less Suitable (0) for every item is summed up and divided with the total number of panels. The items which did not meet the minimum values were refined or considered to be excluded from the final instrument.

Determination of Items' Acceptance and Rejection Criteria
The following requirements were used for the determination of acceptance and rejection of items in this instrument.
(1) An item is unconditionally accepted if CVR ≥ 0.428. This value is determined by the number of expert panels (N = 21, Table 2). (2) An item is accepted if CVR value is 0 < CVR < 0.428 and ̅ ≥1.5. The CVR value between zero and 0.428 indicates that over half of the expert panels rate the items as "Suitable or Very Suitable." The ̅ ≥1.5 shows that the mean of judgments is close to the value of "Suitable or Very Suitable" and indicates that the mean is equal to or greater than 75 percent of the maximum mean value (2), which is higher than the minimum acceptable value of 60 percent (Chalavi et al., 2015). (3) An item is either refined or excluded from the instrument if CVR ≤ 0 and ̅ < 1.5. These values show that the item was not judged to be "Suitable or Very Suitable" (essential item in Lawshe's scale) by at least half of the panel and possessing a mean of judgments that is closer to "Totally Not Suitable or Not Suitable" (unessential item in Lawshe's scale).

Results
A total of 21 expert panels, which include 14 professional experts and seven lay experts, were involved in validating the VoTAL instrument. The response rate received from all the professional and lay expert panels was 100 percent. All the expert panels (professional and lay experts) completed their assessment within the given period. The 14 professional expert panels were academicians who worked as lecturers or researchers in the education sector. In contrast, the seven lay expert panels were the research subject who directly involved with the assessment process in the vocational college. The list of professional and lay expert panels' expertise and years of experience are shown in table 3 and table 4, respectively. Based on table 3, the 14 professional experts' years of experience were ranged from five to 38 years. While based on table 4, the seven lay expert panels' years of experience were ranged from 11 to 29 years. Technical and Vocational education, Engineering education 12 Technical and Vocational education curriculum 11 Table 5 shows the results of the CVR analysis for each item based on 14 professional experts, seven lay experts, and a total of 21 experts. However, item rejection and acceptance criteria were based solely on the overall CVR value of 21 expert panels and the mean of judgments for each item. The CVR value for the professional and lay experts is shown only for comparison purposes. Based on the CVR acceptance values shown in Table 2, the CVR requirements for the total of 21 expert panels are CVR ≥ 0.428. Apart from that, the mean of judgments value ( ̅ ) must be equal to or greater than 1.5. Thus, the overall analysis of 125 items showed that 25 items did not comply with the minimum requirement of the overall CVR value (CVR ≥ 0.428) and mean of judgments ( ̅ ≥1.5). These items are item number 10, 19, 20, 21, 24, 27, 32, 44, 48, 67, 68, 76, 81, 88, 95, 96, 97, 98, 103, 104, 115, 121, 123, 124 and 125. All of these items were excluded from the instrument. 0.619 1.76 Accepted The 25 items that were not fulfilled the requirements were from the Assessment Foundation (9 items), Use of Assessment (5 items), and Assessment Quality (11 items) constructs. Table 6 shows the rejected items and their respective constructs.

Discussions
VoTAL instrument is designed to assess vocational teachers' self-perceived assessment literacy. This study had established the VoTAL instrument's content validity by employing Lawshe's (1975) CVR model. To improve the instrument's validity, a joint panel consisted of both professional experts and lay experts were invited. In addition, due to the confusion related to Lawshe's codes from the previous study, the Likert scale was used to improve the response process. As suggested by Rubio et al. (2003), all expert panels in this study were analyzed together and did not distinguish between professional and lay experts. Although the use of expert panels provides valuable information for the researcher to revise the instrument, there are also some limitations to this method. Expert input feedback is subjective; thus, analyses are subject to bias that can occur amongst experts (Zamanzadeh et al., 2015). The final results of CVR calculations show that 25 items below the minimum requirement criteria of CVR ≥ 0.428 and ̅ ≥1.5. Only 100 items were accepted for the final instrument. The accepted items show CVR values are between 0.428 to 1.00, and the mean of judgments ranged from 1.57 to 2.00. Twelve items showed a 100 percent consensus from the expert panels with the CVR value of 1.00 and the mean of judgments of 2.00. The accepted items were then arranged according to the format specified. Besides, some of the accepted items were also revised to enhance the wording and sentence structure. Thus, the refinement and improvement processes were made for the final 100 items. Comments and recommendations from the expert panels on every item were analyzed critically and taken into consideration to ensure the quality of each item. Measurement of assessment literacy requires a valid and reliable instrument. Thus, the findings of this study showed that experts in the field of assessment literacy believed that the selected items had the potential to be used for the said purpose. The findings from this study also indicated that the VoTAL instrument is a promising instrument that could be used to assess self-perceived assessment literacy among vocational teachers.

Conclusions
This study illustrates how to perform a content validity analysis by using Lawshe's CVR method, which is a critical step in instrument development. Expert panels were used to assess and judge the instrument items. This new VoTAL instrument has demonstrated adequate and acceptable content validity. Based on the calculated CVR value of 125 items from the 21 expert panels, only 25 items were below the requirements' criteria. In conclusion, after the content validation process, a total of 100 items from 125 items remain. The CVR is a prominent measurement model that quantifies agreement among experts through statistical analysis. The decision to include or exclude items was made clearly and appropriately. Apart from that, the psychometric properties of measuring instruments must also be rigorously tested. Therefore, for the better applicability of the measurement instrument, future research should warrant that each instrument is subjected to appropriate validity tests. Thus, the revised version of the VoTAL instrument will be subjected to a pilot study to investigate other additional validity and psychometric properties of the instrument.
Overall, the content validity study on the VoTAL instrument has contributed to the existing literature of instrument validity research by demonstrating the used of Lawshe's CVR approach method to assess the content validity of an instrument. Apart from that, this study has also contributed to the existing assessment literacy research through the development and validation of the VoTAL instrument, which was develop based on the current JCSEE 2015 classroom assessment standard. The VoTAL instrument was designed to assess vocational teachers' self-perceived assessment literacy. Several instruments had been established over the years to assess teachers' assessment literacy, such as Classroom Assessment Literacy Inventory (Mertler, 2003), Assessment Literacy Inventory (Campbell et al., 2002), Teacher Assessment Literacy Questionnaire (Plake, 1993) and Assessment in Vocational Classroom Questionnaire (Kershaw, 1993). Although progress has been made in developing valid and reliable assessment literacy measures, this VoTAL instrument is different from others as it was developed based on the new JCSEE 2015 assessment standard rather than the STCEAS 1990 standards. This study extends the work of DeLuca et al. (2016a) by promoting the use of the JCSEE 2015 standard in dealing with current contemporary assessment demand.