Evaluation on The Face and Content Validity of a Soft Skills Transfer of Training Instrument

This paper presents the evaluation on the face and content validity of a Soft Skills Transfer of Training Instrument. This is a newly developed instrument with an aim to assess the influencing factors of soft skills application back into the workplace after training amongst clerical employees in a Malaysian context. The instrument comprises of 85 items with twelve constructs: personal efficacy; intention; motivation; expectations; training contents; delivery; awareness; interventions; superior’s support; trainer’s quality; workplace; and climate. The constructs were derived from Baldwin and Ford’s Transfer of Training Model (1988), Burke & Hutchins’ Model of Transfer (2008), and Holton’s HRD Evaluation Research and Measurement Model (1996). The evaluation was based on the opinions of five experts from academia and training industry background. Each item was evaluated to ensure its relevance to the study and rated based on a dichotomous rating of agree or disagree. The ratings were used to calculate the extent of agreement between raters using Kappa Index (Kappa) and Content Validity Index (CVI). As this study involved multiple raters, Fleiss’ Kappa Index (FKI) was applied. The calculation yielded that the inter-rater agreement was 0.76 (FKI) while the proportion agreement was 0.96 (CVI). Items with at least a CVI of 0.80 were included in the final instrument. This evaluation confirmed and supported that the face and content validity of the items on the instrument could be used to assess the influencing factors of soft skills training transfer and ready for a pilot study.


Introduction
Organizations must consistently reskill and upskill their employees to improve their job performance to ensure long-term survival. In today's knowledge-driven global economy, learning agility is imperative for organizations to remain competitive (Noe & Kodwani, 2018;Putta, 2014). To manage this situation, Truitt (2011) has asserted that the most common practice by organizations to enhance employees' performance is through training. Generally, training has been described as a strategic approach that employers utilize to improve job performance of employees (Noe & Kodwani, 2018;Truitt, 2011). It is a setting where employees acquire specific competencies that can be customized based on employees' needs in meeting organizational goals. Employers are spending a huge amount of yearly budget for training because they require their employees to apply the knowledge, skills, or attitude (KSAs) learned into their jobs (Blume, Ford, Baldwin, & Huang, 2010;Noe, 2010). It is obvious that employers expect that training can provide necessary improvement in their employees' job quality and productivity. In addition to job performance, training can positively affect employees' commitment and job satisfaction (Feng & Richards, 2018;Ibrahim, Boerhannoeddin & Bakare, 2017).
In doing so, ensuring effectiveness of the training is crucial as it depends on how well the training contents are applied back into the workplace. The concept of applying the training contents is known as the transfer of training. Transfer of training is described as the application of KSAs learned during training session into jobs and the workplace (Burke, Hutchins & Saks, 2013;Noe, 2010). However, according to Franke and Felfe (2012) it has been reported that despite billions of dollars spent for training, the transfer of KSAs has only resulted in less than 10%. Furthermore, soft skills such as communication, integrity, and work attitude are difficult to quantify (Robles, 2012) compared to hard skills, like typing, accounting and computing. Only by understanding the influencing factors of soft skill transfer, possible ways to further improve the transfer of KSAs will be discovered.
Therefore, there is a need to determine influencing factors of soft skills training transfer to facilitate improvement in job performance at work. As such, the Soft Skills Transfer of Training Instrument (SSTTI) has been developed to identify influencing factors of soft skills application into the workplace amongst clerical employees after training. To realize this effort, it is also critical for a research outcome to be regarded as rigorous and trustworthy. In that respect, Morris and Burkett (2011) have emphasized that the validity and reliability of a research instrument are crucial to be considered as acceptable. On that note, this paper aims to evaluate the face and content validity of the SSTTI being a tool to assess influencing factors of soft skills training transfer application amongst clerical employees in a Malaysian context.

Background and Rationale of the Study Importance of Soft Skills in the Workplace
Most researchers have concurred that soft skills are the essential elements of human competencies required for a person to perform at work and in day-to-day life (Ibrahim et al., 2017;Robles, 2012). Several descriptions of soft skills include character traits, attitudes, and behaviours (Robles, 2012), that enable a person to adapt in or outside the workplace context (Bhanot, 2009). Some examples of soft skills abilities are interpersonal communication, work attitude, time management, critical thinking, and problem-solving skills. It has been suggested that soft skills are important for career success in the workplace because soft skills contribute 85%, whilst technical skills contribute only 15% (Wats & Wats, 2009). In addition, researchers have found that soft skills can have significant impacts in developing high-performance organizations (Moeller, Robinson, Wilkowski, & Hanson, 2012;Robles, 2012) and for that reason employers usually expect their employees to demonstrate these skills in the workplace. Pleasant and conducive working surroundings might allow employees to be more involved at work.
Despite the Industry Revolution 4.0 (IR4.0), numerous scholars have agreed that Artificial Intelligence (AI), robotics or machines will not be able to demonstrate soft skills abilities like communication, empathy, integrity, and collaboration (Bonekamp & Sue, 2015;Bowles, 2014). Also, Lazarus (2013) has stressed that in the current working environment where many human issues still exist, technical skills alone are not sufficient. This is because only human can understand and manage human related issues whereas AI, robotics or machines cannot perform such functions. Moreover, according to the World Economic Forum (WEF) reports in 2016 and 2018, soft skills will remain vital for employees to perform their jobs efficiently, in spite of technological advancement. WEF (2018) has also published 'The Future of Jobs Report' revealing the top ten skills in 2018 and skills trending for year 2022, whereby soft skills still dominate the skills needed in year 2022. Hence, employers still need to give priority to soft skills training to prepare their employees in dealing with challenges pose by the IR4.0.

Soft Skills Transfer of Training
Burke and Hutchins (2008) have affirmed that the subject on improving transfer of training has been researched for over two decades including the aspect of organizational setting. Additionally, scholars have asserted that many factors can influence training effectiveness as influencing factors can vary, that is the factors can directly or indirectly influence the transfer of training (Blume et al., 2010;Holton, 2005). Scholars have also identified that influencing factors are linked to various stages of training which can influence the transfer of training (Bhatti & Kaur, 2012;Tonhäuser & Büker, 2016). Therefore, it is pertinent to understand the relationship between these influencing factors to the transfer of training.
Based on the analysis of the relevant literature, relevant variables have been carefully determined for the study to identify influencing factors on soft skills transfer of training. It is crucial to investigate these variables to ascertain the extent of influence they have on the transfer of soft skills after training. The findings will reveal influencing factors that are much needed in the effort to improve soft skills transfer of training. As a result, training participants will be able to apply the training contents learned into job and the workplace, more effectively. Moreover, employers are insisting that training investment should yield favorable outcomes and have expressed their dissatisfaction over low transfer rate of soft skills (Ibrahim et al., 2017;Franke & Felfe, 2012) compared to hard skills. Thus, there is the need to undertake this study to identify influencing factors of soft skills training transfer.

Validation of Instrument
According to Ryan-Nicholls and Will (2009), validity and reliability are important elements in both quantitative and qualitative research, particularly in assessing the credibility of a new research instrument. This is to ensure the new instrument can be considered as a good instrument and able to measure what it is supposed to measure as intended (Field, 2005;Thatcher, 2010). The strength of an instrument depends on the measurement precision of the identified variables, whereby an instrument can accurately assess the construct being measured in the study (Azwani, Nor'ain & Noor Shah, 2016) to be considered valid. It is very important for a new instrument to be recognized and accepted as free of bias to avoid inaccuracy of outcomes (Chiwaridzo et al., 2017;Sikorskii & Noble, 2013). The instrument is regarded reliable when it is able to produce similar results to a certain degree each time the procedure is repeated (Wong, Ong & Kuek, 2012). Typically, the main types of validity used are face validity, content validity, construct validity, and criterion validity. In this study, the face and content validity were evaluated to determine that the instrument is valid and reliable to measure as intended.
Oluwatayo (2012) has enlightened that face validity refers to the researcher's subjective evaluation with respect to the items whether the items are relevant, practical, unambiguous, and clear for the study. Also, operationalization of a construct requires the face validity to assess the extent of a measure has to the specific construct. The face validity assesses the items form in relation to the language clarity, feasibility, readability, consistency of style and formatting (Hamed, 2016). Common techniques to measure the face validity includes percent agreement, Cohen's Kappa Index for two raters, and Fleiss' Kappa Index (FKI) an extension of Cohen's Kappa for multiple raters (Fleiss, Levin & Paik, 2003). Table 1 (below) exhibits the interpretations of Kappa by well-known scholars, namely Fleiss, Landis and Koch, and Altman. Content validity refers to the items evaluated on the instrument are in fact representative and adequate to measure a particular construct (Sangoseni, Hellman & Hill, 2013). Each item is tested to ensure the item is phrased clearly and properly, as well as to determine if it is applicable to the intended construct. The items are assessed to confirm correct scoring and the instrument scaling are suitable for the content of the construct (Yassir, McIntyre & Bearn, 2016;Zamanzadeh et al., 2015). Flawed content validity means that confirming reliability for an instrument is impossible. A Content Validity Index (CVI) greater than 0.78 is considered excellent, irrespective of the number of experts (Polit, Beck & Owen, 2007;Sangoseni et al., 2013), otherwise it will raise the issue of objectivity and adequateness of the items.
The expert panel members with appropriate and extensive knowledge on the related subjects are essential in the evaluation process (Hamed, 2016). This is because the experts will provide a collective of opinions that are critical for a thorough evaluation of the items on the instrument. The evaluation is to indicate if the items are relevant and comprehensive enough to establish its credibility (Sangoseni et al., 2013). Additionally, it has been suggested to report on the proportion agreement of the face and content validity to indicate data variability and to increase confidence of the new instrument (Azwani et al., 2016). Subsequently, this will prevent doubts on the validity of the instrument. Thus, this paper intends to address the face and content validity by ensuring that the items fit well with each construct regarding the soft skills transfer of training for use on the selected samples of the study.

Methodology Description of the Instrument
This is a new instrument containing 85 items of twelve constructs, namely personal efficacy; intention; motivation; expectations; training contents; delivery; awareness; interventions; superior's support; trainer's quality; workplace; and climate. It is a set of self-assessment questionnaire with a 6-point Likert-type scale, ranging from 1 "strongly disagree" to 6 "strongly agree". The instrument is to evaluate clerical employees' perceptions related to soft skills training attended.

Process of Administering the Face and Content Validity
In deciding the number of experts to determine the face and content validity, there have always been inconsistent recommendations. For example, some scholars suggested that at least seven to ten persons will be enough to have a chance agreement for the face validity (Hernández-Garbanzo, 2011;Thomason, 2008). Meanwhile, for the content validity several scholars recommended the number of experts should be between two to six (Strickland, Strickland, Wang, Zimmerly, & Moulton, 2010;Umar & Su-Lyn, 2011). Another recommendation states that it is at least five experts to have sufficient control over chance agreement (Zamanzadeh et al., 2015).
As such, for this study five experts were deemed sufficient to perform the face and content validity procedure. The identified expert panels consisted of two academicians and three training practitioners as shown in Table 2. The experts were selected and included based on the following: • Strong academic background • Knowledgeable on the related subjects in their respective field • Familiarity with the concepts and constructs in evidenced-based practice • Years of working experience in their respective positions in academia and training industry The process started with the researcher delivering the instrument and an introductory cover letter to the two expert panels (academicians). There was also an enclosed resource with detailed information and instructions for the experts' easy reference. While for the other three experts (training practitioners), the researcher emailed the introductory cover letter, the instrument, and the detailed information with instructions for their action. Among the information provided to the panel of experts were the conceptual and operational definitions of the constructs. As for the instructions, the expert panels were asked to provide comments and suggestions on each item whether the item was appropriate and adequate for the study. The researcher personally collected the completed instrument from the two experts (academicians) and received via email from the other three experts (training practitioners). Upon receiving the input from the expert panels, several items in the instrument were revised, rephrased, added, or removed. This evaluation process was conducted for three rounds in order to refine the items and in congruently accepted by all the expert panels. The total duration of the evaluation process was two months.

Development of the Scale
In this study, a quantitative survey research design was employed. The self-developed instrument was formed based on the literature review related to the studies on soft skills, transfer of training, and constructs of the study. The study's underpinning theories included Baldwin and Ford's Transfer of Training Model (1988), Burke & Hutchins' Model of Transfer (2008), and Holton's HRD Evaluation Research and Measurement Model (1996). The instrument consisted of three sections and the questionnaire was developed in the English language. All items were evaluated for clarity, readability, consistency, redundancy, and relevance to the construct under study. Several items were revised, rephrased, added, or removed consistent with the feedback gathered from the expert panels.
In the process, the face validity of The dichotomous scale was used with the options of "Agree" and "Disagree" to indicate an agreement and disagreement of respective items, where an agreed item denoted that the item met the requirement under the relevant construct. To ensure the assessment of the face validity was conducted properly, the following criteria suggested by Oluwatayo (2012) were used: (1) Suitability of grammar, (2) Clearness and unambiguity of items, (3) Correct spelling of words, (4) Precise sentence structure, (5) Appropriateness of font size, and (6) Structure of instrument. The expert panels were encouraged to provide written comments or suggestions on the empty space at each item for the instrument improvement.
For content validity, the dichotomous scale also used the options of "Agree" and "Disagree". Hamed (2016) has described that an agree rating indicated the item was objectively structured and could be positively classified under the construct being studied, while a disagree rating indicated the item was perceived as inconsistent or potential difficulties in regard to clarity and conciseness. In general, the experts were requested to provide their feedback according to the criteria suggested by Lam, Hassan, Sulaiman, and Kamarudin (2018) as follows: (1) Appropriateness of the construct definition, (2) Items should cover the full range of content within each construct, (3) Items should be clearly worded and unambiguous, (4) Appropriateness of items for the target group, and (5) Any new items to add into the instrument. Similarly, the experts were also allowed to offer written comments or suggestions on each item for improvement. Kilem (2008) has stated that in determining the face validity, the expert panels indicated their response by using "Agree" and "Disagree" scale and the response were analyzed using Kappa. Kappa is widely used in many studies that extended to medical, psychology, educational research and related fields. The analysis of Kappa is meant to seek the extent of agreement among the expert panels on the constructs under study (Kilem, 2008;Oluwatayo, 2012). Kappa measures the agreement between two or more raters, whereby Kappa value equal to +1 implies perfect agreement, -1 implies perfect disagreement (negative values indicate agreement less than chance) and 0 is exactly what would be expected by chance (Anthony & Joanne, 2005;McHugh, 2012). Normally, Kappa value is interpreted the same across various studies (McHugh, 2012) and the interpretation of the Kappa value by Fleiss, Landis and Koch, and Altman are referred to determine the minimum acceptable agreement (Nurjannah & Sri Marga, 2017). As five raters were involved in this study, the Fleiss' Kappa Index (FKI) was applied to interpret the Kappa value for the face validity (Fleiss et al., 2003).

Data Analysis for the Face and Content Validity
For the content validity, the expert panels also indicated their response by using "Agree" and "Disagree" scale of the items and then the response were analyzed using CVI. The items that received an "Agree" respond were deemed relevant and if the agreed items needed minor rewording, rephrasing or brevity, the experts specified in the space provided for each item. To calculate the CVI, the items received an "Agree" from all experts were given a score of +1.0. While the items received a "Disagree" were given a score of +0.0 because the items were viewed as irrelevant, and a proportion of a score if an expert or experts were unable to agree (Sangoseni et al., 2013). This is in accordance with a recommendation that the level of agreement among the experts and the proportion of items that received a rating from the experts is assigned a numerical value (Polit et al., 2007). An "Agree" response ratings by expert panels that yielded a CVI of greater than 0.78 is considered excellent, irrespective of the number of experts to avoid the issue of objectivity and appropriateness (Polit et al., 2007;Sangoseni et al., 2013). Otherwise, the items on the instrument must be revised and corrected to ensure its reliability and validity.

Results
The results for the face validity and content validity were evaluated by five panel of experts: two academicians and three training practitioners. All of them have extensive working experience and are knowledgeable in their respective field which are appropriate for this study.

Face Validity with Fleiss' Kappa Index (FKI)
After the expert panels reviewed the items on the instrument, the responses were analyzed using FKI due to its suitability for multiple raters. Based on the FKI analysis, k> 0.75 is excellent agreement and p< .005 is significant. Therefore, in this study the inter-rater agreement yielded that the Kappa value was 0.76, which indicated the result for FKI was considered excellent agreement and p = .000< .005 was significant. The consolidated responses from the experts also endorsed that the twelve constructs with 85 items could be retained for content validity evaluation. However, some improvements were made on the items of the instruments based on the comments and suggestions from the experts. Among the feedback received from the experts are listed below: • Acceptable format for the instrument • Items and constructs matched • Select the appropriate language for the target respondents • Rephrase the sentence structure • Choose simple words (vocabulary) • Keep the sentence short • Use simple language • Avoid double barrel questions • Use only one language for clarity (dual language can be confusing) • Ensure clear instruction for each section

Content Validity with Content Validity Index (CVI)
After the instrument was reviewed, the expert panels agreed that all 85 items were relevance to their respective constructs. Also, the items were improved further based on the feedback received, such as rephrasing the sentence structure, selecting better choice of words, removing the doublebarrel questions, and revising the instruction for clarity. Table 3 displays the Summary of Content Validity Index (CVI). The results showed 68 items were agreed by all the five experts scoring a CVI item of 68. While four out of five experts agreed on 17 items with a CVI of 0.80 per item, totalling 13.6. Since CVI of 0.78 denotes as excellent and recommended by scholars (Polit et al., 2007;Sangoseni et al., 2013), the 17 items with a CVI of 0.80 were considered relevant to the constructs being studied and thus, remained on the instrument. The results also indicated that the instrument had a total CVI of 81.6 which presents an excellent CVI proportion agreed of 0.96. Therefore, this study reported that the SSTTI as a valid and reliable tool to assess influencing factors of soft skills training transfer amongst the selected respondents of the study.

Discussion
The SSTTI, a newly developed instrument was evaluated on the face and content validity by a panel of experts. The instrument contains 85 items and twelve constructs. The face validity results established that the instrument was reliable, whereby the items were appropriate for the purpose of the study. The Fleiss's Kappa Index (FKI) delineated the chance agreement with the Kappa value of 0.76 that indicated the instrument as excellent and significant at p = .000< .005.
The content validity was also evaluated using the Content Validity Index (CVI) that measured the proportion of agreement among the experts. This allows easy interpretation of which the experts agreed that all the original 85 items were acceptable. With the feedback from the experts and an extensive review of the literature, the items were further modified and improved to ensure relevance, adequateness, and representativeness of the constructs. Subsequently, the SSTTI was considered as having an excellent content validity at 0.96. Hence, it is a valid instrument for the study.

Conclusion
In conclusion, the validity and reliability of items on the Soft Skills Transfer of Training Instrument (SSTTI) as a significant research tool was reviewed in order to determine the items were valid to measure the intended constructs. In this evaluation, the face and content validity of this instrument attained high Kappa and CVI value that were considered excellent. Therefore, this process had provided a reliable and valid instrument that is robust to evaluate relevant constructs under study. Consequently, the SSTTI has displayed an appropriate and acceptable measurement performance needed to assess influencing factors of soft skills training transfer amongst clerical employees in a Malaysian context. Since the SSTTI seems to have excellent face and content validity, a pilot study can now be arranged.