Assessing Gender Disparity in Job Satisfaction: A Bayesian Approach

This study demonstrates the use of Bayesian independent samples t-test to illustrate gender disparity in job satisfaction. Results from this state-wide study involving over 4000 employees in private companies in Sarawak, Malaysia show male employees are generally more satisfied in job satisfaction as compared to their female counterparts. This finding casts doubt on the contented female worker paradox, which refers to the notion that although women received fewer job-related benefits in the workplace, they are just as satisfied (or more satisfied) with their jobs. Poor reproducibility of the findings is causing more confusion on gender disparity in job satisfaction. Frequentist approaches have dominated the field of Psychology thus far. Prior researches on gender disparity were centred mainly based on p-values and null hypothesis significant testing (NHST). Instead of making an inference based on a cut-off value of 0.05, it is more intuitive and convincing to illustrate the weight of the evidence favouring a given hypothesis using likelihood ratio of one hypothesis to the other.


Introduction
Job satisfaction is one of the most widely investigated phenomena in organizational research (Vukonjanski & Terek, 2014). The constant attention given to job satisfaction is warranted because it is an important indicator of social sustainability in an organization (Baumgardt, Moock, Rossler, & Kawohl, 2015;Perez, Fernandez-Salinero, & Topa, 2018), notably when high job satisfaction is inversely correlated with turnover intentions (Tschopp, Tgrote, & Gerber, 2014). In the literature, the term "job satisfaction" and "job contentment" are often interchangeable (Rahman et al., 2012). Yousef (2017) noted that job satisfaction is crucial as it affects organizational commitment, job performance, absenteeism, and employee's work attitude.
Poor reproducibility and replicability of the findings in prior researches regarding gender disparity in job satisfaction has become a problem because it creates more confusions in this matter. The statistical community, namely the American Statistical Association, has been vocal about their concern on this problem because poor reproducibility and replicability casts doubt on the validity of scientific research (Wasserstein & Lazar, 2016). Past researches on gender disparity in job satisfaction typically use frequentist approaches, namely the null hypothesis significant testing (NHST). Bernardo (2006) noted that over the last 25 years in the field of psychology, frequentist statistics were most often adopted by researchers. Researchers utilizing NHST in their research make inferences in a straightforward fashion: if the computed p-value falls below 0.05 (a commonly used threshold or a cut-off value), they will reject the null hypothesis (indicating the absence of an effect) and feel licensed to infer that the effect is present (alternate hypothesis) (Ly et al., 2018). The use of p-value is now under scrutiny due to its misuse and limitations. For example, p-values do not provide information about the precision and the effect size of the effect. Moreover, researchers often did not address the alternative hypothesis, error types, or power. American Statistical Association warned against the unwarranted use of p-value because it can lead to the distortion of the scientific process (Wasserstein & Lazar, 2016).
The use (or rather, the misuse) of p-values in psychological literature can be attributed to the fact that Bayesian statistics are not typically taught in elementary statistics courses (Bernardo, 2006;Page & Satake, 2017). O'Hagan (2004) noted that researchers outside the academic statistics community were not aware of Bayesian statistics, and the application of the Bayesian approach was limited because of the lack of proper computational tools. Due to the increased awareness on fallacies and misinterpretations of frequentist statistical methods, along with the emergence of powerful and user-friendly tools for Bayesian inference, the interest in Bayesian statistics has been steadily increasing (O'Hagan, 2004;Page & Satake, 2017). Bayesian statistics is preferable as compared to frequentist statistics as they provide more direct, intuitive, and meaningful statements of the probability that the hypothesis holds, therefore, they are more suited for decision making (O'Hagan, 2004;Wasserstein & Lazar, 2016). Due to the limitations of p-values, this study strayed from the norm of using the classical independent t-test and instead adopted Bayesian independent t-test to illustrate gender difference in job satisfaction. By using Bayesian independent t-test, should gender disparity exists, this difference can be more clearly defined.

Method
This study gathered 4093 employees of private organizations in Sarawak, Malaysia (male = 3117, female = 976). Enumerators were required to brief and assure the participants of their anonymity. Identifying questions were not included in the questionnaire. This study used a bi-language (English and Malay) self-administered instrument. At least one researcher was present during data collection to answer questions regarding the items in the instrument. The instrument covers multiple facets of job satisfaction, which includes job position, salary, allowances facilities, leadership, job performance, co-workers, motivation, working environment, and career path (Ezzat & Ehab, 2019).
The instrument was validated using the Andrich's Rating Scale model. Items show a good range of weighted and unweighted mean squares (0.5-1.5), indicating that the items are productive for measurement (Linacre, 2002). Item selection for a norm-referenced test considers both the item difficulty and item discrimination. It was suggested that item discrimination should be in the range of 0.3 to 0.7. However, for polytomous items, while there are no comparable rules of thumb exist, items should show moderate discrimination values (Meyer, 2014). The instrument was tested for reliability and shows McDonald's ω = 0.881, average inter-item correlation = 0.480, Rasch item reliability = 0.997, and Person reliability = 0.820. The instrument records Item Separation Index = 17.477 and Person Separation Index = 2.108. Principal Components Analysis of standardized Rasch residuals of the instrument shows the eigenvalue of 1.72, which gives another evidence of a unidimensional construct. Furthermore, the instrument has a good fit to the model, with the Weighted Mean Squares in the range of 0.83-1.08, and the Unweighted Mean Squares in the range of 0.85-1.15.
Bayesian independent samples t-test were conducted using JASP. JASP (https://jasp-stats.org), is an open-source statistical program that also provides a set of comprehensive tools for Bayesian inferences (JASP Team, 2019). All plots were created using r package ggplot2 (Wickham, 2016) and JASP version 0.10 in "Bayesian independent samples t-test" function (JASP Team, 2019).

Results
The composite scores of job satisfaction were computed using linear transformation method. The possible score for job satisfaction ranges from 0 (lowest) to 100 (highest). The hypotheses for this analysis are as follow: H0: focal = reference; H1: focal ≠ reference. Comparatively, Bayesian independence t-test (as in Fig.2) was conducted to illustrate the evidence for this result favoring male employees, using default Cauchy prior = 0.707. As expected, results reaffirm this finding. A two-sided alternate hypothesis was selected (focal ≠ reference) for the analysis. Fig. 2 shows the parameter δ for the Bayesian independent t-test. The evidence for and against the alternate hypothesis is illustrated using the probability wheel. Grey dots in the graph are the prior and posterior density, respectively. From Fig. 2, the evidence for H1 is strong, with a very large Bayes factor, BF 10 = 4.75e+11, indicating substantial evidence in favour of H1 because the data are estimated to occur 4.75e+11 times more under H1 than under H0. The median of the posterior distribution for δ is 0.284, with a central 95% credible interval between 0.214 and 0.357. The credible interval (0.214 -0.357) is conditional on H1. Therefore, since the evidence for H1 is strong, the true value of the population effect, δ is between 0.214 and 0.357. This estimation has an error rate of 2.306e-18, showing excellent numerical algorithm stability. Fig. 3 shows the robustness of the Bayes factor to various Cauchy prior width, r. Default Cauchy prior (r = 0.707) is chosen (grey dot) for this analysis. The evidence for the H1, in this case the results are stable across the distribution of priors, indicating that the analysis is robust. The evidence in favor of H+ is particularly strong. The plot indicates BF 10 for the default prior (r = 0.707), wide prior (r = 1), and ultrawide prior (r = 1.414). As in Fig. 3, the stability of the Bayes factor uncompromised as Cauchy prior changes. The maximum BF 10 is attained when Cauchy prior = 0.28. The evidence for the H1 is very stable across all prior distributions. Fig. 4 illustrates sequential analysis if the estimates, mapping BF 10 against a different number of participants (n). The plot reveals increased evidence for H1 as n increases.

Discussion
This study shows that male workers are generally more satisfied at work as compared to female workers. This finding casts doubt on the "paradox of contented female worker" which is a notion that women tend to report having higher job satisfaction than their men counterparts despite having objectively worse job conditions and receive fewer job-related benefits (Vladisavljević & Perugini, 2018;Westover, 2012). Interestingly, Valet (2018) noted that only when there is a "considerate number" of female workers in the work sector, the paradox of contented female workers is prevalent. In other words, the paradox does not hold in a predominantly male work sector. This study provides support to this claim because male workers were found to have higher job satisfaction compared to their female counterparts (ratio male-to-female = 3:1).
Poor reproducibility of results regarding gender disparity in job satisfaction continues to a problem in organizational research. Hong, Lim, Tan, Ekhsan, & Othman (2012) collected cross-sectional data involving employees from a public university in Sarawak. Results from their analysis (using classical independent samples t-test) showed that there was no significant difference between male and female employees in all facets of job satisfaction (p-values ranges from 0.053 to 0.965). The inconsistencies in findings show there is a need to adopt a more intuitive method to infer gender disparity in job satisfaction. Bayesian inference is more intuitive especially when p-value (in classical t-test) approaches 0.05 (a commonly used α value). Furthermore, future research should also pay attention to various factors that potentially affect gender disparity in job satisfaction, which include: (1) Culture: Society and culture play a major part in job satisfaction between male and female workers (Anari, 2012). For example, women in Turkey often take the role of breadwinner and driven out to work for economic reasons. Therefore, they gave little consideration in achieving personal satisfaction in their job (Aydin et al., 2012). A worker's race could also influence gender differences. Hersch and Xiao (2016) observed that white women are more satisfied than Asian and black women. Moreover, they also reported that white men are more satisfied compared to Asian men in terms of salary, job benefits, and job autonomy.
(2) Work sector: When gender research continues to show discrepancies in results, the notion of gender-linked labor market segmentation arises (Magee, 2013). For example, in the industrial sector, men tend to have a higher level of job satisfaction than women. Meanwhile, in the educational sector, women reported higher job satisfaction than men (Vukonjanski & Terek, 2014). Furthermore, Anari (2012) noted that women tend to be more satisfied working in femaledominated workplaces. These findings indicate that the work sector could significantly influence job satisfaction in male and female workers because of the different nature of work.
(3) Facets of job satisfaction: Most research on gender paradox in job satisfaction involves global satisfaction (Mueller & Kim, 2008). The effect of assessing global versus faceted job satisfaction should be examined thoroughly, such as investigating the reproducibility of findings using global satisfaction score and the computation of composite job satisfaction score from the multi-faceted instrument. This study adopted multi-faceted Likert-type items. Multiple facets of job satisfaction (items) were linearly transformed to form composite scores of the latent construct-job satisfaction. One of the probable causes of discrepancies in prior researches is because researchers adopted different job satisfaction instruments. While some researchers used a singleitem instrument (for example, (Hauret & Williams, 2017), others utilized multi-facets instruments (for example, (Bacha et al., 2015;Hong, Lim, Tan, Ekhsan, & Othman, 2012). Using multi-facets instruments (rather than facet-free instruments) allow for a more in-depth analysis of which aspects of job satisfaction contribute to gender disparity. Men and women have a different view of what is important to them in terms of work (Clark, 1997;Ezzat & Ehab, 2019). Bender, Donohue, and Heywood (2005) argued that women value work flexibility more than men. Men may have a higher expectation of income. On the other hand, women are more affected by other benefits, such as having social insurance (Clark, 1997;Ezzat & Ehab, 2019). These findings may have some empirical support, but they cannot account entirely for the observed gender disparity in job satisfaction (Zou, 2015).
This study did not attempt to extrapolate findings beyond its intended population. This study uses self-administered data; therefore, it is assumed that the participants understand the items as intended and answered truthfully during the survey. Furthermore, it is not feasible to include all unanticipated external factors that could influence job satisfaction. This study demonstrates the use of Bayesian inference in organizational research, specifically on job satisfaction. The philosophy of Bayesian statistics and tutorial on how to run the analysis is beyond the scope of this paper. Theoretical explanation of Bayesian theorem and inference can be found in Box & Tiao (1992). Meanwhile, Marsman & Wagenmakers (2017) illustrated Bayesian inference using the JASP statistical program.

Conclusion
Poor reproducibility in job satisfaction literature calls for a need to deviate from the frequentist approach that depends on a cut-off value to make an inference. This study demonstrates the use of Bayesian inference in organizational research in the hope that other researchers in the area recognize the benefit of Bayesian inference, and realize that Bayesian statistics are no longer hindered by the lack of user-friendly and power tools. Based on this cross-sectional study involving over 4000 employees in private sectors across Sarawak, Malaysia, Bayesian independent samples t-test shows that the level of job satisfaction is higher in male workers as compared to female workers. The results of this study contribute to the job satisfaction and human resource literature, particularly from an emerging economy perspective. Findings from this study have practical implications for management and human resource professionals. The employees' satisfaction is a precursor to the organization's sustainability and ultimately, an organization's competitive edge. Therefore, organizational leaders and human resource professionals should work continuously to achieve gender equity and promote social sustainability in the organization by enforcing inclusive organization-wide human resource management strategies, gender-neutral policies, and supportive organizational cultures.