A Review of Syntactic Complexity Studies in Context of EFL/ESL Writing

Vol. 12, No. 10, 2022, Pg. 441 454 Abstract This paper reviews more than 60 research papers, articles, or book chapters on syntactic complexity in the context of EFL/ESL writing in the past two decades. Most of the papers are from journals indexed in Social Science Citation Index, Scopus, and Chinese Social Science Citation Index. Five strands of syntactic complexity studies in the context of EFL/ESL writing are concluded: syntactic complexity measurement indices and tools, the relationship between syntactic complexity and language proficiency, syntactic complexity developmental studies, comparative studies, and variables influencing syntactic complexity. Gaps in previous studies and future research focuses are analyzed and concluded: new indices from other syntactic perspectives should be considered and research on their validity and reliability should be done. For comparative studies, more attention should be given to comparing the writing of EFL/ESL learners with different backgrounds. For research on variables influencing syntactic complexity, the interactive effect of multiple variables needs to be investigated; if only one variable is examined, other variables should be controlled. Besides, in future syntactic complexity research, theoretical interpretation and theory building should be given more attention, and the observation period for longitudinal research should be extended. Finally, more qualitative studies are needed for in-depth investigation of specific syntactic perspectives, such as syntactic errors. writing scores as the dependent variable and syntactic elaboration and diversity indices as explanatory variables. The results showed that the explanatory variables can explain 45.3% of the variance in EFL writing scores. In their study, the syntactic elaboration indices are traditional syntactic complexity indices provided by L2SCA , while the syntactic diversity measured by the corrected type -token ratio of dependency relations” (p. 1). This is a new metric proposed by them and it is reported that this can significantly predict the EFL writing quality. coordination, and subordination, belong to the categories of syntactic fluency and complexity. Syntactic accuracy, indicated by the percentage of syntactic errors, has been seldom examined. More in-depth qualitative studies on syntactic errors should be investigated in the future. This review paper concludes the five strands of syntactic complexity studies in the context of EFL/ESL writing: syntactic complexity measurement indices and tools, the relationship between syntactic complexity and language proficiency, syntactic complexity developmental studies, comparative studies, and variables influencing syntactic complexity. It draws out a clear framework of syntactic complexity studies in the literature. This study also proposed some deficiencies in current syntactic complexity studies, which provides the gaps and points out the direction for future research. Finally, this review study can provide a reference for future research on syntactic complexity in the context of EFL/ESL writing and help teachers to evaluate the quality of EFL/ESL writing from a syntactic perspective. However, this study also has the following limitation: different research results have been obtained on each strand of syntactic studies due to different research subjects, objects, methods, and designs, while the differences are not in-depth compared and analyzed. Meta-analysis can be applied to review research in future syntactic studies.


Introduction
The language performance and behavior of English as a foreign or second language (EFL/ESL) learners are characterized by multi-components and multi-dimensions (Housen & Kuiken, 2009), including complexity, accuracy, and fluency (CAF; Ellis, 2009), which have become major research variables in applied linguistics, as well as important parameters to evaluate language performance, describe language competence, and measure language development of EFL/ESL learners (Ortega, 2003). Among these variables, syntactic complexity is an important indicator of EFL/ESL learners' syntactic knowledge reserve and language-using ability (Liu & Sun, 2022). Syntactic complexity, also called linguistic complexity or syntactic maturity, can be understood as the variety and sophistication degree of the syntactic structures conveyed in written production (Bulte & Housen, 2014;Crossley & McNamara, 2014;Lu, 2011;Ortega, 2003). Syntactic complexity has received broad attention from EFL/ESL writing development researchers since it is a reliable indicator of language development or writing quality of EFL/ESL learners (Ai & Lu, 2013;Crossley & McNamara 2014;Larsson & Kaatari, 2020;Lu, 2011;Norris & Ortega, 2009). The basic and primary issue of complexity studies is how it is measured. Some researchers have proposed different measures and indices and developed tools for automatically measuring them. Besides, most of the studies are conducted based on the notion that there is a correlation between syntactic complexity and EFL/ESL writing quality or language performance. Some studies are longitudinal, investigating the development of syntactic complexity, while some studies are cross-sectional through comparing syntactic complexity between different groups: native speakers and non-native speakers, low-level EFL/ESL learners and high-level learners, et cetera. In addition, studies on variables that can influence syntactic complexity were also conducted by other researchers. This paper reviews syntactic complexity research papers from Web of Science, Scopus, and Chinese National Knowledge Infrastructure (CNKI), and most CNKI papers are from journals indexed in the Chinese Social Science Citation Index (CSSCI). The following strands of syntactic complexity studies are concluded: syntactic complexity measurement indices and tools, the relationship between syntactic complexity and language proficiency, syntactic complexity developmental studies, comparative studies, and variables influencing syntactic complexity.

Review of Syntactic Complexity Studies Syntactic Complexity Measurement Indices and Tools
A primary issue of syntactic complexity research is how to measure this linguistic instruct. Many researchers have been searching for valid and reliable developmental measures of syntactic complexity that can be used to impartially gauge EFL/ESL learners' overall proficiency or developmental level of their target languages (e.g., Larsen-Freeman, 2009;Lu, 2011;Norris & Ortega, 2009;Ortega, 2003;Wolfe-Quintero et al., 1998). However, no consensus has been reached on syntactic complexity measurement indies so far (Gao, 2021). Some researchers used the mean length of sentences, clauses, or T-units as syntactic complexity parameters and found that there is a significantly positive (Ortega, 2003) or negligible (Knoch et al., 2014) correlation between the mean length of production units and EFL/ESL learners' language proficiency. Other researchers regard the ratio of subordinate or coordinate units in sentences as complexity parameters, such as the ratio of complex Tunits (Lu, 2011) and the ratio of clauses (Ishikawa, 1995). Some researchers question the precision of the existing parameters and believe that studies using macro syntactic complexity parameters could confuse and obscure the data (e.g., Biber et al., 2011;Kyle, 2016;Larsen-Freeman, 2006;Norris & Ortega, 2009). They augured that studies based on macro parameters can only reveal certain trends in EFL/ESL learners' language development but cannot reveal what micro syntactic structure caused the development of these macro parameters. Therefore, some fine-grained indices of syntactic complexity were proposed. For example, Kyle (2016) addressed these gaps by developing and validating a tool for "Automatic Assessment of Syntactic Sophistication and Complexity" (TAASSC). In the past decades, different systems or software for automatically measuring syntactic complexity had been developed, such as Biber Tagger (Biber, 1988(Biber, , 2006, L2 Syntactic Complexity Analyzer (L2SCA; Ai & Lu, 2013;Lu, 2010), Coh-Metrix 3.0 (McNamara et al., 2014, and TAASSC.

Biber Tagger
The Biber Tagger was initially designed for multidimensional analysis (MDA; Biber, 1988) of register variation and it can tag lexical and syntactic features of a text (Biber, 1988;Biber et al., 1999). Later it was used for analyzing grammatical complexity in ESL writing (e.g., Biber et al., 2011;Biber et al., 2016), including syntactic complexity analysis. With tag numbers of different types of syntactic features, the scores of syntactic complexity indices can be calculated with their computational formulas. In their investigation of grammatical complexity patterns, Biber et al (2016) considered 23 types of linguistic features, which can "adequately capture both the variety and sophistication aspects of syntactic complexity" (Lu, 2017, p. 499). It should be noted that Biber Tagger is a semi-automated tool for syntactic complexity analysis since manual checking and tagger revision are needed in the process of use. Bulte and Housen (2012, p. 27) argued that syntactic complexity includes syntactic breadth and depth. Based on their interpretation of syntactic breadth and depth, syntactic complexity can be investigated from three perspectives: sentential, clausal, and phrasal complexity. Ai and Lu's (Ai & Lu, 2013;Lu, 2010Lu, , 2011 L2SCA was developed based on these three perspectives. L2SCA measures syntactic complexity from dimensions of sentence length, sentence structures, and phrase structures, which can objectively reflect Bulté and Housen's interpretation of syntactic complexity. L2SCA is a tool that allows researchers to analyze the syntactic complexity of written English samples using 14 indices covering measures of Length of production unit, Amounts of coordination, Amount of subordination, Degree of phrasal sophistication, and Overall sentence complexity. Table 1 shows the measures and relevant formulas of syntactic complexity provided by the L2 Syntactic Complexity Analyzer. The Web-based L2SCA 1 is also available. The texts can be uploaded to the system and the results of the syntactic complexity will be generated as a CSV file for further analysis.  (2010) Coh-Metrix 3.0 Coh-Metrix 2 is a system initially for computing computational cohesion and coherence metrics for written and spoken texts. Though the primary inspiration for developing Coh-Metrix was to better understand the important role of cohesion in comprehension (hence the "Coh" in "Coh-Metrix") , the recent development of discourse processing, computational linguistics, and natural language processing enables the researchers and developers to include more functions in Coh-Metrix. Besides estimating cohesion and coherence from measures of connectives, referential cohesion, latent semantic analysis (LSA; Landauer et al., 2013), and situation model, Coh-Metrix can also provide researchers with a range of traditional textual measures such as lexical diversity, syntactic complexity, and syntactic pattern density. Syntactic complexity indices provided in Coh-Metrix are listed in Table 2. is an efficient and advanced automatic tool for syntactic analysis. It not only includes the traditional classic syntactic complexity indices provided by L2SCA, but also includes finegrained indices of clausal, phrasal complexity, and syntactic sophistication. The fine-grained clausal and phrasal indices are more detailed than the indices proposed by (Biber et al., 2011).

L2SCA
In addition, it is reported that the phrasal indices of TAASSC are better indicators for EFL/ESL writing quality than traditional T-unit-based indices (Kyle & Crossley, 2018).

Relationship Between Syntactic Complexity and Language Proficiency
An important and major strand in this field is the relationship between the syntactic complexity of EFL/ESL writing and language proficiency or writing quality. Some researchers (e.g., Bi & Jiang, 2020;Ferris, 1994;Larsen-Freeman, 1978;Lu, 2010) in this strand conducted cross-sectional design trying to determine to what degree various measures of syntactic complexity correlate with or have an impact on the language proficiency. For example, Ferris (1994) conducted a correlation analysis for 160 ESL texts produced by a group of low-level proficiency students (n=60) and a group of high-level proficiency students (n=100) and reported that Number of words, Synonymy/antonymy, Word length factor, and Passives significantly predict the proficiency. With factor analysis, it was reported that variables of Words per sentence, Relative clauses, Coordination, and Prepositional phrases covary with each other. Lu (2010) analyzed college-level second language writing data from the Written English Corpus of Chinese Learners (Wang et al., 2005), and presented findings of a corpusbased assessment of 14 measurements of syntactic complexity as objective indices of ESL writers' language development. The results show that most of the syntactic complexity indices' values are dependent on the levels of EFL writers. Using different syntactic complexity indices from previous research, Bi and Jiang (2020) conducted regression analysis with EFL writing scores as the dependent variable and syntactic elaboration and diversity indices as explanatory variables. The results showed that the explanatory variables can explain 45.3% of the variance in EFL writing scores. In their study, the syntactic elaboration indices are traditional syntactic complexity indices provided by L2SCA, while the syntactic diversity "was measured by the corrected type-token ratio of dependency relations" (p. 1). This is a new metric proposed by them and it is reported that this metric can significantly predict the EFL writing quality. Though these studies are various in the size of writing data samples, measures for syntactic complexity, as well as the operationalization of proficiency (e.g., standardized test scores, holistic ratings, or using program level), most of them reported that syntactic complexity is an important and reliable predictor for language development or proficiency. However, most of the research on this strand was cross-sectional studies, while Zheng (2011); Bulte and Housen (2014); Wu and Lei (2018) argued that more longitudinal studies should be done to explore how syntactic complexity develops during a certain period.

Developmental Studies on Syntactic Complexity
Some researchers conducted longitudinal research to find out how syntactic complexity develops by examining changes in syntactic complexity over a certain period (e.g., Bulte & Housen, 2014;Martinez, 2018;Ortega, 2000;Stockwell, 2005;Vercellotti, 2017). Bulte and Housen (2014) explored the development nature and extent of adult ESL learners' writing proficiency over a semester of academic English language course. The results showed that different syntactic complexity indices develop at a different pace. In addition, it is reported that subordination, a traditionally core subcomponent of syntactic complexity, did not significantly develop over such a short time of four months. This is in line with what Ortega (2012) reported that in all settings and under all conditions, subordination metrics may not be sufficient to assess the syntactic complexity of ESL writing. However, Vercellotti (2017) reported different conclusions. She tracked 66 ESL learners for nine months and reported that they have a similar development trajectory of syntactic complexity in their ESL writing and "the best-fitting model was a linear growth trajectory" (pp. 10-11).
To explore the multidimensional development and change of language complexity, Zheng and Feng (2017) conducted a one-year follow-up study on high-level EFL learners and the results showed that syntactic complexity was highly correlated internally, with the same interactive influence despite different development patterns. This result not only complements the longitudinal study of syntactic complexity but also has implications for second language teaching. Besides, some researchers (e.g., Bao, 2009;Li & Liu, 2016;Martinez, 2018) conducted quasilongitudinal studies of syntactic complexity by comparing EFL/ESL learners of different levels.
Bao (2009) explored syntactic complexity in Chinese university students' EFL writing of four grade levels in terms of length of language unit and amount of subordination. The results showed that the length of language units increase rapidly as the grade level increased, while there was no significant increase in the amount of subordination. However, Li and Liu (2016); Martinez (2018) reported that most of the syntactic complexity indices show a significant and linear increase with the rise in grade and writing proficiency.
To sum up, EFL/ESL learners' syntactic complexity presents a trend of zigzagging changes on the whole, with each index competing with each other and developing differently. Such follow-up and longitudinal studies not only dynamically record the development process of EFL/ESL learners' language learning, but also comprehensively and effectively evaluates their language competence. However, the number of existing studies is relatively small due to the long time consumed and a large corpus for long-term studies. Therefore, more studies are needed to improve the conclusions. Granger (2002) proposed an important method of analyzing EFL/ESL learner language, in another term, interlanguage: Contrastive Interlanguage Analysis (CIA). With this method, the most used way is comparing EFL/ESL writing with English as native language (ENL) writing. For instance, Foster and Tavakoli (2009) argued the applicability and validity of using ENL writing as the baseline to investigate EFL/ESL learners' writing performance. Taking subordination and mean length of utterance as indices of syntactic complexity, they analyzed the effect of task features on syntactic complexity in ENL writing and compared the result with that obtained from their previous study on ESL writing (Tavakoli & Foster, 2008). In addition, Ai and Lu (2013) conducted a corpus-based comparative study investigating syntactic complexity in university students' EFL and ENL writing. Analyzing 400 essays from the Written English Corpus of Chinese Learners Version 2.0 (WECCL 2.0) (Liang, et al., 2008) and 200 essays from the Louvain Corpus of Native English Essays (LOCNESS) (Granger, 1998), Ai and Lu used 10 syntactic complexity indices to investigate whether and to what degree the university students' EFL and ENL writing differ from four dimensions: length of production unit, amount of subordination, amount of coordination, and degree of phrasal sophistication. Besides, Mancilla et al (2017) compared the use characteristics of ten syntactic complexity indices provided by L2SCA between ESL and ENL writing and found that four indices were significantly different between them. Native speakers used more subordinate structures, while ESL learners performed better in coordinate structures and phrasal complexity. Most comparative studies compare syntactic complexity between EFL/ESL and ENL writing.

Comparative Studies on Syntactic Complexity
Research on comparing syntactic complexity between ENL writing is rare with few exceptions. Taking writing samples from the LOCNESS, Yang and Geng (2021) investigated the difference in syntactic complexity between British and American university students' writing. the results showed that the differences were shown in the mean length of the production unit and complex nominals, but not in the amount of subordination and coordination.

Variables Influencing Syntactic Complexity
Some other researchers have attempted to explore variables affecting syntactic complexity in EFL/ESL writing, such as writing topic (e.g., Yang et al., 2015;Yang & Kim, 2020;Yoon, 2017), planning condition (e.g., Ellis & Yuan, 2004;Rahimi & Zhang, 2018) mother tongue (e.g., Lu & Ai 2015;Staples & Reppen, 2016), writing tasks (e.g., Adams et al., 2015;Ruiz-Funes, 2015;Wang et al., 2020), and writing time allotment (e.g., Adams et al., 2015;Kuiken & Vedder, 2012;Lu, 2011). For example, Yang et al (2015) compared the syntactic differences of ESL learners in their argumentative writing on two different topics. The results showed that topics can significantly affect syntactic complexity features of ESL writing; mean sentence length and mean T-unit length can effectively predict the writing quality of different topics. Ellis and Yuan (2004) claimed that planning conditions (no planning, unpressured online planning, and pre-task planning) have a significant impact on syntactic complexity in EFL narrative writing and pretask planning can result in greater syntactic diversity. Lu and Ai (2015) compared ENL writing with seven groups of EFL/ESL writing by writers of different first language (L1) backgrounds, the results showed that ENL writing is different from at least one EFL/ESL group in all the 14 indices of syntactic complexity. Finally, many researchers (e.g., Li & Wang, 2017;Ruiz-Funes, 2015;Wang, 2013;Wang et al., 2020) examined the influence of writing task complexity on syntactic complexity, but the results of the research are inconsistent for now. For example, Wang (2013) claimed that task complexity does not affect the language complexity of EFL writing. In addition, there is no unified standard for determining whether a task is complex or not (Liu & Sun, 2022). Though these studies have ascertained various variables that influence syntactic complexity in EFL/ESL writing, most of them only examine one variable in one study. The interactive effect of multiple variables needs to be investigated in future research.

Discussion and Conclusion
To conclude the measures of syntactic complexity, there are two main streams: the traditional T-unit-based measures and the fine-grained indices of syntactic complexity. The former concerns the length of language production unit, amount of subordination and coordination, and degree of sophistication; the latter pays more attention to clausal and phrasal complexity and syntactic sophistication. Though the fine-grained indices of syntactic complexity have enriched the measurement indices of syntactic complexity to some extent, new indices from other syntactic perspectives, such as syntactic diversity and syntactic errors, should be considered and research on their validity and reliability should be done in the future. In terms of the relationship between syntactic complexity and language proficiency, it is can be concluded from previous literature that syntactic complexity is positively correlated with and is an indicator of EFL/ESL writing quality and language proficiency. For syntactic developmental studies, nonlinear development is an important feature of the dynamic development of EFL/ESL writing. Syntactic development thus resembles a sinusoid rather than a straight line, with progressions and setbacks, peaks and valleys.
For syntactic complexity comparative studies, most of them compare EFL/ESL writing with ENL writing. The results can be generally concluded that the low-level (undergraduates or below) learners' EFL/ESL writing is syntactically less complex that their counterparts' ENL writing (e.g., Ai & Lu, 2013), while for high-level (postgraduates or higher) EFL/ESL learners, there is no significant difference in syntactic complexity between EFL/ESL and ENL writing.
What is more, advanced EFL/ESL learners tend to use longer and more complex sentences than ENL writers (Lu & Ai, 2015). In future studies, more attention should be given to comparing the writing of EFL/ESL learners with different backgrounds, which will rich the evidence and theory of syntactic complexity of interlanguages. Finally, for research on variables influencing syntactic complexity in EFL/ESL writing, most studies have reported that variables, such as writing topic, planning condition, mother tongue, writing tasks, and writing time allotment, do significantly affect the syntactic complexity, with few exceptions. For example, Wang (2013) reported that writing task complexity does not affect syntactic complexity in EFL writing. Through reviewing the syntactic complexity research of the past two decades, it can be found that there are some deficiencies: the first is variable control. It has been proved that many variables could affect syntactic complexity in EFL/ESL writing and show different syntactic characteristics on different indices (Beers et al., 2011;Lu, 2011;Yang et al., 2015). Therefore, the influence of these variables on syntactic use should also be considered when exploring the syntactic complexity in EFL/ESL writing. However, most previous studies did not control other variables when examining the variable in the test. This could affect the research result. Second, there is a theoretical inadequacy in previous studies. Most studies only focus on the description of the development characteristics of each syntactic complexity indices or the difference of them between different groups, but rarely explore the internal reasons for the development trend or the differences. However, an in-depth study of the syntactic differences at different stages of learners or learners with different backgrounds has important theoretical significance for EFL/ESL teaching and learning. The third deficiency is that the observation period of syntactic complexity is short, especially for developmental studies. According to Larsen-Freeman's (2006) dynamic system theory, there is a competitive relationship among various sources of language acquisition, such as syntactic complexity and fluency, and language development is not a linear growth model. Therefore, a too-short observation period is not enough to demonstrate the development and change of each dimension of syntactic complexity. Finally, in the above-mentioned strands of syntactic complexity research, most are quantitative studies based on the calculation of syntactic complexity indices. However, indepth qualitative studies are rare. In addition, according to the model of CAF (Barkhuizen & Ellis, 2005;Ellis, 2003Ellis, , 2008Skehan, 1998), complexity, accuracy, and fluency are three widely recognized variables reflecting L2 performance and proficiency. However, most indices investigated in previous syntactic studies, such as mean length of production units, coordination, and subordination, belong to the categories of syntactic fluency and complexity. Syntactic accuracy, indicated by the percentage of syntactic errors, has been seldom examined. More in-depth qualitative studies on syntactic errors should be investigated in the future. This review paper concludes the five strands of syntactic complexity studies in the context of EFL/ESL writing: syntactic complexity measurement indices and tools, the relationship between syntactic complexity and language proficiency, syntactic complexity developmental studies, comparative studies, and variables influencing syntactic complexity. It draws out a clear framework of syntactic complexity studies in the literature. This study also proposed some deficiencies in current syntactic complexity studies, which provides the gaps and points out the direction for future research. Finally, this review study can provide a reference for future research on syntactic complexity in the context of EFL/ESL writing and help teachers to evaluate the quality of EFL/ESL writing from a syntactic perspective. However, this study also has the following limitation: different research results have been obtained on each strand of syntactic studies due to different research subjects, objects, methods, and designs, while the differences are not in-depth compared and analyzed. Meta-analysis can be applied to review research in future syntactic studies.