Language Learning & Technology 2024, Volume 28, Issue 1 ISSN 1094-3501 CC BY-NC-ND pp. 1–21 ARTICLE Digital game-based learning’s effectiveness on EFL learners’ receptive and productive vocabulary knowledge Wen Jia, Nanjing Agricultural University; Xi’an Jiaotong-Liverpool University Liping Zhang, Army Engineering University of PLA Austin Pack, Brigham Young University-Hawaii Yi Guan, Global Institute of Software Technology Bin Zou, Xi’an Jiaotong-Liverpool University Abstract Although digital game-based vocabulary learning (DGBVL) has received increasing attention in the past two decades, the impacts of DGBVL on the depth of word knowledge are still not well understood, especially in regard to productive vocabulary learning and DGBVL’s long-term efficacy. This study leverages a quasi- experimental research design to investigate DGBVL’s long-term effects on receptive vocabulary (RV) and productive vocabulary (PV). Forty-eight Chinese English-as-a-foreign-language (EFL) university students, assigned to the experimental and control groups, were instructed by a DGBVL approach and PowerPoint (PPT) lecturing, respectively, over the course of 18 weeks. Specifically, a mixed 2×2 repeated measures experimental design was conducted by adopting instruction type (DGBVL and PPT lecturing) and testing time (pretest and posttest) as the independent variables, with RV and PV proficiency as the respective dependent variables. The results suggest that instruction type and teaching time have significant effects on participants’ RV and PV learning achievements. However, teaching time’s effect size outweighs instruction type. The findings are highly encouraging for the use of DGBVL in the EFL classroom, as it may serve as an effective and long-lasting pedagogical tool within this context. Keywords: Digital Game-Based Learning, Receptive Vocabulary, Productive Vocabulary, English as a Foreign Language Language(s) Learned in This Study: English APA Citation: Jia, W., Zhang, L., Pack, A., Guan, Y., & Zou, B. (2024). Digital game-based learning’s effectiveness on EFL learners’ receptive and productive vocabulary knowledge. Language Learning & Technology, 28(1), 1–21. https://hdl.handle.net/10125/73554 Introduction While “vocabulary knowledge is most likely to develop if there is a balance of incidental and deliberate appropriate opportunities for learning” (Nation, 2020, p. 15), many English-as-a-foreign-language (EFL) learners, such as those in China, lack opportunities for exposure to the target language (i.e., English), thereby making incidental vocabulary learning challenging in formal and informal settings. In such cases, intentional vocabulary learning in EFL learners’ schooling serves as a valuable and complementary approach (Webb, 2020) by providing more input and output practices through direct instruction. The teacher-centered lecturing and focus on rote memorization of vocabulary have long dominated China’s EFL context (Yang & Dai, 2011), resulting in tedious and inefficient lexical learning. Encouraging, however, are emerging cutting-edge technologies that afford new avenues of vocabulary learning, including the use of digital games. 2 Language Learning & Technology While the idea that games, playing, and learning all interrelate extends back into ancient history (Pack & Newbould, 2018), the educational value of digital games has received growing attention since Prensky’s (2001) comprehensive description of digital game-based learning (DGBL). Digital games have been acknowledged as potential pedagogical tools with applications in a variety of fields. It was not until advances in technology and digital games became more accessible in the mid-2000s that interests in digital game-based language learning (DGBLL) were kindled in the field of second and foreign language teaching (Reinhardt, 2019). This form of language pedagogy has drawn attention because it is able to afford opportunities for edutainment (i.e., a blend of education and entertainment), and may assist in engaging and motivating learners, as well as in facilitating increased interaction and exposure to the target language (Reinders, 2012). Compared with studies investigating language reading (e.g., Dourda et al., 2014), speaking and listening (e.g., Hwang et al., 2016), writing (e.g., Allen et al., 2014), grammar (e.g., Reitz et al., 2016), pronunciation (e.g., Cerezo et al., 2019) and other language skills, digital game-based vocabulary learning (DGBVL) has drawn the most attention to date in the DGBLL field. Studies exploring DGBVL do so from three major angles: its affordances (e.g., Rama et al., 2012); the cognitive and psychological consequences, such as motivation, engagement, and anxiety (e.g., Yang et al., 2020); and most commonly, its influence on vocabulary learning (e.g., Alfadil, 2020; Rasti-Behbahani & Shahbazi, 2020). Studies on the latter have mostly been carried out in formal settings (i.e., L2 classrooms) and report the positive effects of DGBVL. Several studies have also observed the effects of DGBVL in extramural learning environments, with frequent gamers outperforming moderate and non-gamers (e.g., Sylvén & Sundqvist, 2012). Though some research has been carried out on DGBVL, much uncertainty still exists about the relation between DGBL and its efficacy for vocabulary knowledge learning. First, most of the extant literature focuses either on vocabulary in very general terms (e.g., Ranalli, 2008; Wu, 2018) or on receptive vocabulary in particular (e.g., Park et al., 2019; Tseng et al., 2020); only a handful of studies make an explicit distinction among vocabulary knowledge types and draw specifically on productive vocabulary (deHaan et al., 2010; Franciosi, 2017) or both receptive vocabulary (RV) and productive vocabulary (PV) (Rasti-Behbahani & Shahbazi, 2020; Sundqvist & Wikström, 2015). Second, DGBL treatment time is another considerable factor. As Zou et al. (2019) note, although a consensus of positive short-term (i.e., tens of minutes to several days) effectiveness of DGBVL has more or less been achieved, disagreement remains as to the long-term effects of DGBVL. Third, previous research on long-term incidental vocabulary acquisition in informal settings (e.g., Sundqvist, 2019; Sundqvist & Wikström, 2015) does not elucidate the subtle changes of RV and PV that occur in intentional vocabulary learning over time. Therefore, this 18- week longitudinal study aimed to investigate EFL learners’ intentional RV and PV learning efficacy from DGBVL in the formal setting as well as explore DGBVL’s long-term effects. Related Conceptions and Literature Review Receptive Vocabulary and Productive Vocabulary Nation developed the word knowledge process model describing what is involved in knowing a word in 2001, and now the model is most widely known in vocabulary studies (Nation, 2020; Webb, 2013). In this model, Nation divides the vocabulary learning process, be it receptive or productive vocabulary, into three aspects: form, meaning and use. Receptive vocabulary refers to vocabulary items whose form, meaning, and use can be learned while listening or reading; productive vocabulary involves the ability to use a word in the right form, meaning, and usage in speech or writing (Nation, 2001). But in terms of language use, “the most important aspects of vocabulary knowledge for an EFL learner are knowledge of word form and the form-meaning connection” (Nation, 2020, p. 18). Considering the EFL context and feasibility of this study, the current study’s working definitions and measurement dimensions of RV and PV follow Nation’s (2001) model, but solely focus on the written form of RV and PV at the form-meaning link. Wen Jia, Liping Zhang, Austin Pack, Yi Guan, and Bin Zou 3 DGBVL for Receptive Vocabulary Learning Studies have explored RV learning yielded from digital games from a variety of perspectives. The first cohort of studies compares game and non-game approaches. These frequently report various games’ influences on vocabulary learning by comparing the DGBL method with other traditional vocabulary learning approaches. Example games include video games (e.g., Calvo-Ferrer, 2017; Chen & Yang, 2013), serious games (e.g., Chen & Hsu, 2020), gamified vocabulary applications (e.g., Dindar et al., 2021; Wu, 2018), virtual games (e.g., Cerezo et al., 2019), amongst others. Some earlier studies explored video games’ impacts on vocabulary learning both in formal settings (Calvo-Ferrer, 2017) and informal settings (Chen & Yang, 2013). Findings suggest that video games tend to be more helpful in the short run than in the long run. Some of these conclusions have been questioned, given they are drawn from students’ subjective self- reporting data (Chen & Yang, 2013). More recently, studies have turned to virtual games that afford more advanced technological support. For example, the use of VR (Alfadil, 2020) and 3D games (Tseng et al., 2020) has been compared with traditional teaching approaches. Both cited experimental studies witnessed significantly higher vocabulary immediate posttest scores than pretest scores in experimental groups that leveraged virtual environments, suggesting that virtual games could facilitate vocabulary learning by providing an interactive and contextualized situation for vocabulary presentation. The second category of studies examines vocabulary learning outcomes resulting from different game designs or gamified functions within the same game. Several have explored the efficacy of games’ reward systems, such as providing badges and/or displaying leader boards to users based on particular achievements. For example, Park et al. (2019) explored the effects of a performance-contingent reward mechanism on participants’ learning, motivation, engagement, and system perception. They found that the reward mechanism produced a statistically significant increase in the level of participants’ receptive vocabulary learning, motivation, and engagement compared with completion-contingent rewards in the same kind of game. Features such as competition and cooperation have also drawn researchers’ attention. However, unlike more positive results demonstrated in the game and non-game approach comparisons, some studies found no significant difference in students’ vocabulary learning between cooperative and competitive conditions (Dindar et al., 2021; Peng et al., 2016). The third group of studies makes use of a cohort study design rather than a controlled trial in order to observe the vocabulary gains or retention that result from DGBVL intervention, and to shed light on the nuances of the process of learners’ vocabulary learning. For instance, Chen and Hsu (2020) investigated 66 EFL college students’ RV by means of a serious game. They divided target words into high, medium, and low-frequency groups and then found that students learned more high-frequency words (occurring more than six times) than the mediate and low-frequency words; nevertheless, it is still possible for students to learn lower-frequency words if they occur in a meaningful context. Laufer and Rozovski-Roitblat (2011) claim that repeated exposure to words helps learners enhance their memory. Chen and Hsu (2020) provide further evidence that repetitive exposure to words has a positive influence on incremental vocabulary learning within the gaming context. Thus, it seems that providing more encounters with target vocabulary to learners for an extended period of time may serve as a considerable factor in practical game-based teaching and learning. DGBVL for Productive Vocabulary Learning Few studies have explored the relationship between DGBVL and PV, and of these, findings show mixed results. For example, deHaan et al. (2010) conducted a 20-minute music video game intervention experiment and found that game-players recalled significantly less PV than game-observers owing to the extraneous cognitive load induced by the gameplay. In a similar vein, Franciosi (2017) examined participants’ knowledge of productive words. In the quasi-experiment, participants in both experimental and control groups were assigned Quizlet drill exercises as their homework after class. During regular 60- minute class sessions, the experimental group used the computer game Energy City, while the control group did not. After the session, participants completed a writing task and were asked to write as much as possible, using 33 target words chosen from the game Energy City. Both curricular and extracurricular settings were 4 Language Learning & Technology found to be moderately effective. However, in a follow-up longitudinal study, only a small positive correlation (r = 0.29) was reported between the 6-week gameplay amount and the occurrences of target words in the writing task. No apparent influence was found between mastery of vocabulary in Quizlet and target words used in the writing. In short, the few studies that focus solely on PV have not provided robust evidence of DGBVL’s effectiveness for PV gains. DGBVL for Both Receptive and Productive Vocabulary Learning There is a notable paucity of studies that investigate the influence of DGBVL on both PV and RV. Sundqvist and her co-researchers have conducted a series of studies in recent years (e.g., Sundqvist, 2019; Sundqvist & Wikström, 2015; Sylvén & Sundqvist, 2012) to evaluate digital games’ impacts on students’ English RV and PV acquisition in extramural environments. Their research suggests that a positive relationship exists between digital gameplay and vocabulary. They found that gameplay time could serve as an important predictor of L2 vocabulary, but game type only had a mediate effect within the COTS games they employed. Accordingly, they suggest learners invest more time in gameplay to acquire more incidental vocabulary and recommend further research into game type’s interplay effects on vocabulary learning. However, some extraneous variables (e.g., participants’ unequal gameplay time) might affect the tenability of the research results discussed above. More recently, some scholars have begun to consider the multidimensionality of word knowledge and to explore DGBVL efficacy from a lexical perspective. Rasti-Behbahani and Shahbazi (2020) randomly assigned 124 Iranian EFL learners to experimental (60-minutes of pair-work gameplay followed by a vocabulary test with access to an online dictionary) and control groups (pair-work on the vocabulary test with access to an online dictionary). Both groups performed equally well in both receptive and productive tests and the DGBVL task effectively enhanced the experimental group participants’ acquisition of productive-recognition of form-meaning. Although the researchers asserted that DGBVL could enhance the acquisition of receptive, productive, recognition, and recall knowledge of orthography, meaning, and association, several issues with their research design weaken their conclusion’s validity. First and most glaring, the reliability and validity of self-designed vocabulary tests in this study were not reported; second, the homogeneity of the two groups participants’ initial English proficiency was not analyzed before the treatment. In addition, the treatment time was only 60-minutes; thus, the long-term effects of DGBVL on both RV and PV remain unclear. In brief, although previous research has investigated PV and RV, these studies have not provided robust evidence for the efficacy of DGBVL on PV and RV, particularly in the context of long-term learning in the language classroom. To summarize, while recent studies have examined DGBVL from different angles, several potential issues are worth noting. First, many studies do not clearly define their vocabulary or word knowledge type, instead opting to generalize using the umbrella terms of “vocabulary learning” or “vocabulary acquisition” (e.g., Ranalli, 2008; Wu, 2018). This makes it difficult to distinguish what exact word knowledge the studies investigated (e.g., RV or PV). Second, in general, most existing studies focus on short-term incidental or intentional vocabulary learning through DGBL. However, as claimed by some researchers (Cobb & Horst, 2011), it appears that a longer period of gameplay under direct instruction (i.e., explicit teaching of specific skills) may be necessary to consolidate vocabulary learning. Third, the generalizability of some published studies on DGBVL’s overall efficacy is problematic, especially concerning its retention effects; this is because effect sizes are frequently not reported, and the treatment durations and the spacings between posttests and delayed posttests vary across different studies. Therefore, it remains to be seen what long- term effects DGBVL may have on RV and PV learning, especially in the context of direct instruction within the EFL classroom. Accordingly, this study aimed to address the following research questions: 1. Does DGBVL have a significant influence on EFL learners’ long-term RV learning? 2. Does DGBVL have a significant influence on EFL learners’ long-term PV learning? Wen Jia, Liping Zhang, Austin Pack, Yi Guan, and Bin Zou 5 Methods These research questions were explored through an 18-week longitudinal mixed repeated measures quasi- experiment design. Participants Participants were recruited by means of convenience sampling from two intact freshmen EFL classes of a key university in Eastern China; 23 (4 males and 19 females) were in the experimental group (EG) and 25 (1 male and 24 females) were in the control group (CG). Such low ratios of male to female students are common for many English major classes in China universities (see the large-scale stratified survey in You and Dörnyei, 2016). All participants passed CET-4 (College English Test Band 4, a national English proficiency test for undergraduates and graduates in China), but had not reached CET-6 (the higher-level test in this national testing system), indicating their intermediate English proficiency. Their ages ranged from 18 to 19. To avoid the heterogeneity that might be brought by convenience sampling, outliers in pretest scores were examined first and an independent samples t-test was used later to compare the means of the two classes. Participants in both groups had no significant differences in their pre-tested RV scores (MEG = 12.26, MCG = 10.92, t(46) = 1.654, PRV = 0.105 > 0.05), and PV scores (MEG = 9.96, MCG = 9.92, t(46) = 0.38, PPV = 0.970 > 0.05). All students voluntarily participated in the experiment and could drop out at any time. They understood that the pre and post vocabulary test scores had no impact on their daily or final exam scores in the semester. Materials Digital Games: Match, Gravity, and Live on Quizlet App The Match, Gravity, and Live games in the Quizlet App were utilized for the EG. These vocabulary games’ learning activities were automatically generated by the Quizlet system based on study sets that were created from the vocabularies of each unit in the Comprehensive English Coursebook (see more details in the following section). Figure 1 Solo Games Match (Top) and Gravity (Bottom) 6 Language Learning & Technology Match (see Figure 1) is a single-player game that aims to strengthen and examine players’ RV knowledge. It asks players to match words with definitions. For each round, once all words are correctly paired together, the players receive feedback, including their completion time, rank, and some additional comments regarding their performance (e.g., “That was fast!”, “Congratulations!”). Players can also send invitations to their friends to challenge their best records via social media (e.g., WeChat). Similar to Match, Gravity (Figure 1) is also a single-player game, but it focuses on both receptive and productive vocabularies. The target vocabulary is embedded in asteroids that fall from the sky. As they fall, players must correctly spell out the words or type out their correct definitions. If players miss a word twice, a red asteroid will destroy players’ home planet. Players can choose between answer modes (English, Chinese, or random), difficulty level (easy, medium, or hard, each with increasing falling speed), and word bank (all words in the study set or self-chosen words). At the end of the game, feedback is provided to players, such as their scores, leadership rank, and award badges for good performance. Players with top scores can then proceed to the next level. Live (Figure 2) is a collaborative multi-player game that solely addresses RV. Players work in small teams (3–4) to learn a Quizlet study set by finding the word that matches a particular definition, while racing against the other teams in the class. Since no individual team member has all of the answers, team members must cooperate and communicate in order to win; this insures engagement and contribution from each team member. Live facilitates more accuracy than speed because once a member chooses an incorrect answer, the whole team must start over from the beginning. Because of these game features, both students’ vocabulary proficiency and communication are likely to be reinforced. Figure 2 Live: A Cooperative Game Lexical Profiles in Comprehensive English Coursebook Comprehensive English Coursebook is a theme-organized college-level English textbook containing 12 units. Each unit involves about 30 target words (totaling 386 words) that occur one to two times in an associated article. Target words are listed in page margins with their phonograms, Chinese equivalents, English definitions, and corresponding proficiency level (CET-4 or CET-6). Each unit’s target words were input and edited by the course instructor into a Quizlet study set that could be automatically transferred into Match, Gravity, and Live games by means of the Quizlet application. Wen Jia, Liping Zhang, Austin Pack, Yi Guan, and Bin Zou 7 Vocabulary Tests A vocabulary pretest and posttest were utilized to examine participants’ vocabulary learning achievements. While identical in items, the order of items was shuffled in order to avoid practice effect. Both the pretest and posttest comprised two parts (see Appendix). Considering the need to go beyond participants’ CET-4 English proficiency level, two to three CET-6 level target words were randomly selected from each unit. This generated a total of 40 target words which were then divided equally into two parts. Part I and II aim to test participants’ RV and PV knowledge, respectively. Part I was in the format of meaning recall; every target word was displayed with underlined and bold font in a contextual sentence chosen from Collins English-Chinese Dictionary. Participants were asked to write down the corresponding Chinese equivalents. Part II adopted a cued recall format, where a L2 word’s first letter was primed in a contextual sentence and the L1 equivalent was presented to help participants spell out the exact word for testing rather than an alternative (Lindstromberg, 2020). Two items from each part are provided as examples: Example from Part I: He spent his adolescent years playing guitar in the church band. Example from Part II: Can you c_____________ music on the computer? (创作) The content validity of the vocabulary tests was established by two experienced experts who are specialized in language testing. They verified that the test items were congruent with target words and that all test items were clear enough to be answered from the sentences presented. Marking criteria were developed to optimize reliability and objectivity. For RV in Part I, variable Chinese answers for each test item were discussed, with several acceptable synonyms being agreed upon. For productive vocabularies in Part II, since “constructed-response items are inherently less controlled, and may be more difficult to score, than selected-response items” (Read, 2020, p. 550), standardized guidelines for marking were created to ensure inter-rater reliability and to take into account the differences in the spelling and grammatical form of target words (e.g., singular or plural noun forms). For spelling, any missing, added, wrong letter, or wrong order per word was regarded as a wrong answer. For grammar, errors in suffix endings involving -ed, -s, and -ing were ignored. The purpose of these criteria was to focus on accurate vocabulary knowledge while grading objectively with acceptable variations in meanings and forms (Hirsh, 2015). Following these criteria, the correct Chinese interpretation and English spelling for each item in the pretest and posttest were graded for 1 point, with a total score of 40 points possible for each test. After being trained in the above-mentioned marking criteria, two experienced instructors graded each test paper independently. The inter-rater reliability of marking accuracy was excellent, with intra-class correlation coefficients (ICC) being 0.998 with 95% Confidence Intervals (CI) [0.997, 0.999] for the RV pretest, and 0.992 with 95% CI [0.991, 0.996] for the RV posttest. As for the PV pre- and posttests, the ICC was 0.998 with 95% CI [0.996, 0.999] and 0.998 with 95% CI [0.997, 0.999] respectively. Subsequently, any disagreements between the two raters were discussed and a consensus was ultimately reached on pretest and posttest scores. Afterward, the Kuder-Richardson formula 20 (KR-20) was leveraged to establish the reliability of the tests since the items in both tests were dichotomous (Woodrow, 2014). The KR-20 reliability of the pretest and posttest were 0.793 and 0.916 respectively, signifying that the vocabulary tests were indeed reliable and suitable for further analysis. Research Procedures Two groups had their Comprehensive English classes on the same day of each week during the entire semester. They had the same Chinese instructor, who had 15 years of experience teaching EFL to University English majors. The EG adopted the game-based learning instruction approach to learn target words by using the game functions embedded in the Quizlet App (as described in the Materials section) on their laptops or smartphones during classes. The CG received instruction by means of a teacher-centered PowerPoint lecturing method to learn target words. The instructor explained the meanings, usages, and sentence examples of target words with the assistance of PowerPoint slides, and participants finished the 8 Language Learning & Technology vocabulary match and fill-in-the-blank exercises in their textbooks in class. Except for the different teaching approaches, other factors, such as the learning content and pace of each class, were kept the same in order to minimize potential influences from extraneous variables. The overall experiment lasted for 18 weeks and consisted of preparation, treatment, and post-experiment phases (see Figure 3). Figure 3 Experiment Flowchart Experimental Class Control Class n=23 n=25 Week 1 (20 mins) Experiment Introduction Week 1 (25 mins) Consent Form Collection Week 1 (45 mins) Vocabulary Pretest Week 2 (45 mins) Quizlet Games Training in Class Week 3-18 Game-Based Vocabulary PPT Lecturing Vocabulary 30 mins/week Learning (GBVL) / 12 Units Learning / 12 Units Week 18 (45 mins) Vocabulary Posttest In the preparatory phase, the experiment’s purposes, procedures, and irrelevance to their formal academic achievements were explained to the students in both groups. Consent forms were then collected from all participants. The vocabulary pretest was also completed by students in the first week under the instructor’s supervision. In Week 2, the EG received a 45-minute Quizlet App utilization training which included instruction on how to install the App, as well as game tutorials and guided gameplay with non-target words. The treatment phase lasted from Week 3 to 18. Each week, both groups spent 30 minutes learning target words in the Comprehensive English Coursebook. Students in the EG played Match, Gravity, and Live games, while those in the CG learned through teacher-instructed PowerPoint lecturing and textbook exercises. The immediate vocabulary posttest was conducted during a 45-minute class in Week 18, under the instructor’s supervision and without reference materials. Data Analysis Data Analysis Methods A repeated-measures analysis of variance (RM-ANOVA) was conducted by using SPSS 26, following a General Linear Model with group (i.e., EG and CG) as a between-subjects variable, the time of tests (T1 pretest and T2 posttest) as a within-subjects variable, and the RV and PV test scores as dependent variables. In order to better distinguish between the effects of these independent variables on vocabulary learning, a Wen Jia, Liping Zhang, Austin Pack, Yi Guan, and Bin Zou 9 RM-ANOVA was selected as the means of statistical analysis because it allows investigating interaction effect between within-subjects and between-subjects factors and is very sensitive in detecting even the slightest variations in the levels of within-subjects factor (Verma, 2015). It is also statistically powerful as it factors out some within-group subject variation and can be used with smaller sample sizes (20 or less) (see Larson-Hall, 2015). RM-ANOVA Assumptions Tests Missing data, outliers, and possible statistical assumption violations were examined with SPSS Frequencies, Explore, and One-Way ANOVA procedures before commencing with further data analysis. No missing values were found in the pre- and posttest scores. Several univariate outliers were detected, but they were deemed to be a small part of the population under study (i.e., three top and bottom students) and not extreme or unusual enough to require deletion (Gamst et al., 2008). Skewness and kurtosis values of each pre- and post- vocabulary test scores1 (Table 1) were between ± 3.0 and ± 7.0, indicating normal distribution was still within an acceptable range, although the Shapiro-Wilk test results of some groups’ scores were statistically significant (p < 0.05) (Gamst et al., 2008). The Bartlett and Brown–Forsythe tests of the equality of error variances assumption were statistically significant for posttest scores (RV and PV) (p < .05) but not for pretest scores (RV and PV), indicating homogeneity across independent variable groups in pretests scores but heterogeneity in posttest scores (Table 1). This heterogeneity might be attributed to the small sample size (Gamst et al., 2008). Lastly, because the variable in this experimental study had only two levels, the data is not suitable to be tested for sphericity and does not need to follow the sphericity assumption, which only applies when there are more than two comparisons to be made in a variable or interaction term (Larson-Hall, 2015). A Greenhouse–Geisser correction was referred to in the following result reports. Given the literature that argues that the ANOVA test is robust against the violation of the normal assumption (Schmider et al., 2010), and that even if the homoscedasticity assumption is violated, results may not be necessarily invalid (Larson-Hall, 2015; Brezina, 2018), we decided to proceed with the RM-ANOVA operation. Two independent samples t-tests were conducted for between-subjects (group) and within-subjects (time) in order to investigate potential differences between treatment groups and to make further comparisons. Results The descriptive statistics for the two groups’ RV and PV scores for T1 and T2, as well as the results of the Shapiro-Wilk test and Test of Homogeneity of Variances are reported in Table 1. Mean values indicate that participants in both groups achieved solid gains in RV and PV after one-semester of treatment; this is made apparent from the much higher T2-RV and T2-PV scores in each group. In addition, the mean values of T2-RV and T2-PV in the EG exceed those in the CG. The statistical significance of these differences for each test is further explored in the following sections. Findings Related to RQ1: DGBVL’s Influence on EFL Learners’ RV The first research question, concerning whether or not DGBVL has a significant influence on EFL learners’ long-term RV learning, was positively confirmed in several ways. First, the results of the mixed 2×2 RM- ANOVA analysis on the RV scores (Table 2) reveal a significant interaction effect between group and time (F(1⁄46) = 44.442, p = .000, η 2 p = 0.491). This indicates that growth in RV from T1 to T2 was significantly different between the two groups, and that the effectiveness of the two instructional approaches was not the same (as illustrated by the steeper slope of EG in Figure 4). The EG improved significantly more (RVT2 – RVT1 = 7.39) than the CG (RVT2 – RVT1 = 2.84), as shown in Table 3. The effect size of the difference, expressed in partial eta square (η 2 p ), shows that this two-way interaction accounts for 49.1% of the variance in RV scores, which represents a large effect size (Cohen, 1988). Second, the results of the RM-ANOVA analysis on RV scores (Table 2) indicate that there was a significant main effect of the independent variable of treatment group (F(1⁄46) = 32.058, p = .000, η 2 p = 0.411). This 10 Language Learning & Technology signifies that there are significantly different effects between the DGBVL and PPT lecturing approaches, although this influence from the treatment group variable (η 2 p = 0.411) is less than the main effect of the variable of time (η 2 p = 0.825). Table 2 Mixed RM-ANOVA Results on RV Type II Sum of df Mean F Sig. Partial Eta Squared S quares Square ( 2 ηp ) Between- subjects Group 313.357 1 313.357 3 2.058 0 .000 0.411 Error 4 49.633 4 6 9 .775 Within- subjects Source Time 605.010 1.000 605.010 216.716 0.000 0.825 Time*Group 124.070 1.000 124.070 44.442 0 .000 0 .491 Error 128.419 46.000 2.792 Table 3 Paired Samples t-Test Analysis on Within-Subjects (Time) GROUP Test Time Mean SD t df Sig. (2- tailed) EG (N=23) RV 1 12.26 2.96 12.55 22 0.000 2 19.65 0.65 PV 1 9.96 3.28 15.43 22 0.000 2 19.35 1.27 CG (N=25) RV 1 10.92 2.66 7.71 24 0.000 2 13.76 2.93 PV 1 9.92 3.32 7.80 24 0.000 2 13.32 3.08 Note. T1 = Pretest, T2 = Posttest, RV = Receptive Vocabulary, PV = Productive Vocabulary, EG = Experimental Group, CG = Control Group Wen Jia, Liping Zhang, Austin Pack, Yi Guan, and Bin Zou 11 Figure 4 RV and PV Scores Across Testing Time in Different Groups Table 4 Independent Samples t-Test Analysis on Between-Subjects (Group) Test Time Group N Mean SD t df Sig.(2-tailed) RV 1 EG 23 12.26 2.96 1.65 46 0.11 CG 25 10.92 2.66 2 EG 23 19.65 0.65 9.79 26.53 0.00 CG 25 13.76 2.93 PV 1 EG 23 9.96 3.28 0.04 46 0.97 CG 25 9.92 3.32 2 EG 23 19.35 1.27 9.00 32.43 0.00 CG 25 13.32 3.08 Note. T1 = Pretest, T2 = Posttest, RV = Receptive Vocabulary, PV = Productive Vocabulary, EG = Experimental Group, CG = Control Group To investigate whether the treatment group difference existed either before or after the experiment, an independent samples t-test was conducted for the between-subjects (group) to make further comparisons. The results in Table 4 reveal that before the experiment treatment, the EG’s RV proficiency (M = 12.26, SD = 2.96) was slightly higher than that in the CG (M = 10.92, SD = 2.66), however, there was no significant difference (t(46) = 1.65, p = 0.11). In contrast, the two groups’ posttest outcomes in RV showed that the EG (M = 19.65, SD = 0.65) significantly outperformed the CG (M = 13.76, SD = 2.93), with t(26.53) = 9.79, p = 0.00. These results suggest that the DGBVL instruction was more effective in improving participants’ RV knowledge than the traditional PPT lecturing approach. Furthermore, the standard deviation (SD) in EG changed from 2.96 to 0.65 after the experiment treatment. However, the SD values for the CG before and after the experiment remained nearly the same (SDT1 = 2.66, SDT2 = 2.93). This likely indicates that DGBVL was instrumental in narrowing down the disparity in vocabulary knowledge among the participants. In short, in regard to RQ1, the data suggests that both time and teaching strategy had a significant effect on participants’ achievements of RV knowledge; however, time’s influence outweighs teaching approaches’ power. Findings Related to RQ2: DGBVL’s Influence on EFL Learners’ PV Statistical results from the mixed RM-ANOVA analysis of treatment group, time, and PV (Table 5) 12 Language Learning & Technology confirmed that DGBVL had a significant influence on the EFL learners’ PV learning results. First, a significant interaction effect was found between time and treatment groups (F(1⁄46) = 65.610, p = .000, η 2 p = 0.588); this indicates that the PV improvements under the 16-week of instruction between the two groups are significantly different (also illustrated in Figure 4). Furthermore, the effects from the interaction are rather large, as indicated from the partial eta square value (η 2 p = 0.588), explaining 58.8% of the variance in PV scores. The finding that participants in the EG gained more significant growth in PV knowledge than those in the CG is also strengthened by the mean difference between post- and pretest PV scores (EG: PVT2 – PVT1 = 9.39, CG: PVT2 – PVT1 = 3.4, see Table 3). Time had a significant main effect on PV learning, as made evident from the mixed RM-ANOVA test results for PV (see Table 5, F(1⁄46) = 288.001, p = .000, η 2 p = 0.862); this effect size is rather large. The paired sample t-test (Table 3) indicated that both groups improved their PV knowledge after 16-week longitudinal treatment under both teaching approaches. Specifically, the EG achieved significantly higher PV scores in the posttest (M = 19.35, SD = 1.27) than that in the pretest (M = 9.96, SD = 3.28), with t(22) = 15.43, p < 0.05. Also, the CG’s gains in PV knowledge were statistically significant, although not as large as EG’s gains (MT2 = 13.32, SDT2 = 3.08, MT1 = 9.92, SDT1 = 3.32), with t(24) = 7.80, p < 0.05. Table 5 Mixed RM-ANOVA Results on PV Type II Sum of df Mean F Sig. Partial Eta Squares Square Squared Between- subjects Group 220.275 1 220.275 16.537 0.000 0.264 Error 612.715 46 13.320 Within- subjects Source Time 943.760 1.000 943.760 288.001 0.000 0.862 Time*Group 215.000 1.000 215.000 65.610 0.000 0.588 Error 150.739 46.000 3.277 The results from the RM-ANOVA test, shown in Table 5 also indicate the significant main effect of the treatment group, that is, the efficacy of DGBVL instruction is significantly different from PPT lecturing instruction (F(1⁄46) = 16.537, p = .000, η 2 p = 0.264). Results of the independent samples t-test, listed in Table 4, indicate that before treatment the EG group (MT1 = 9.96, SDT1 = 3.28) had roughly the same PV proficiency as the CG group (MT1 = 9.92, SDT1 = 3.32), with t(46) = 0.04, p = 0.97. After treatment, however, significantly different PV proficiency scores were found between the two groups, with the DGBVL group’s posttest scores (MT2 = 19.35, SDT2 = 1.27) much higher than those in the PPT group (MT2 = 13.32, SDT2 = 3.08), with t(32.43) = 9.00, p = 0.00 (correction values are reported here since equal variance was not assumed in PV posttest scores). Similar to RV, a decrease in the SD was also observed for the EG, but not in CG, in PV scores. Thus, it can be concluded that DGBVL appears to be more effective than PPT lecturing for participants’ mastery of RV and PV. Wen Jia, Liping Zhang, Austin Pack, Yi Guan, and Bin Zou 13 Discussion This study sought to determine DGBVL’s long-term effects on both EFL learners’ RV and PV learning. The findings revealed that DGBVL was efficacious for both RV and PV knowledge learning. It was also found that time, or prolonged use of DGBVL in EFL classroom instruction, played a vital role in students’ RV and PV learning process. The findings that treatment time and instruction types have significant interaction effects echo previous studies investigating DGBVL’s performance in RV and add new evidence of its effects on PV. First, in relation to those studies that solely investigated RV, this study further supports the claim that gamified vocabulary applications can lead to successful vocabulary gains (Dindar et al., 2021), even more so than traditional learning approaches (e.g., teaching via PowerPoint) (Wu, 2018). Second, in light of the literature focusing solely on PV, the DGBVL’s effectiveness found in this study is more encouraging. For instance, Franciosi (2017) found that PV score differences between the EG and CG had a medium-size effect (ω2 = 0.1), and that a small positive correlation existed between gameplay time and PV occurrences in a follow up writing task. In the current study, DGBVL was found to be more effective for PV learning, with a large size effect (η 2 p = 0.264). This larger effect size might be the result of the different research designs leveraged. In Franciosi’s (2017) study, 60-min DGBL intervention was added to EG as an extra enhancement after participants’ extracurricular Quizlet vocabulary drill exercises. In the current study, however, Quizlet games were integrated into the EG’s 16-week curricular learning process as the primary instructional approach. In addition, while Franciosi tested participants’ PV performance by counting the target words in the writing task, this study measured the productive form-meaning recall of target words. Third, the present study’s results also corroborate Rasti-Behbahani and Shahbazi’s (2020) findings, who found DGBVL to be a more effective treatment for both RV and PV learning. Another finding worth discussing is that the main effect of time outweighed one of the instructional approaches, at least in its explanatory power of RV and PV gains, which suggests that time is a factor that should not be ignored. Previous research has demonstrated a significant positive correlation between gameplay time and RV and PV incidental learning in the long-run extracurricular context (e.g., Sundqvist, 2019; Sundqvist & Wikström, 2015). This 16-week study adds further evidence that long-term exposure to DGBVL in the formal context with direct instruction and intentional learning can also have a significant positive effect on RV and PV gains, and has promising potential in narrowing down the disparity in vocabulary knowledge amongst learners. A possible explanation for this is that longer treatment time is more likely to guarantee multiple and repetitive exposures to target words in the gaming context, which consequently results in a positive influence on incremental learning of vocabulary knowledge (Chen & Hsu, 2020). Instead of repetitive and monotonous vocabulary drills, this study produced 36 (12×3) customized Quizlet games based on 12 units’ vocabulary study sets and 3 types of games (Match, Gravity, Live). Such games might generate interest or motivation for a broad range of individuals, rendering them more engaged in their studies over a 16-week long period, which is a commonly reported characteristic of DGBVL in previous studies (e.g., Calvo-Ferrer, 2017; Yang et al., 2020). Furthermore, it is worth noting Spada’s (2005) observation that research conducted in real classrooms with real learners is more likely to inform classroom practice. The practice of importing a language coursebook’s vocabulary items directly into Quizlet and then incorporating the educational games into routine classroom practice is within the means of most language teachers around the world. Accordingly, since both formal and informal settings have initially witnessed the positive effects from DGBVL, practitioners and researchers would likely benefit from more empirical studies that identify the duration time of DGBVL instruction that is ideal for maximizing RV and PV gains, and explore the optimal combination of in-school and out-of-school DGBVL. While it is logical that small time frames may not offer enough exposure for long-term retention, it is also equally plausible that the efficacy of DGBVL instruction may eventually fade over long time periods as each exposure may eventually result in diminishing returns in motivation and engagement. 14 Language Learning & Technology The relationship between gender and DGBVL is another issue worth investigating more fully. In some earlier studies, males were found more engaged in games and tended to dedicate more time to gameplay (Chou & Tsai, 2007). Sundqvist’s studies (Sundqvist & Wikström, 2015; Sylvén & Sundqvist, 2012) found a positive correlation between gameplay time and L2 proficiency and reported that boys outperformed girls in vocabulary learning. However, they argued that it was not gender per se that explained the found differences and they hypothesized that girls would equally benefit well in L2 proficiency if their gameplay time was similar to boys (Sylvén & Sundqvist, 2012). However, the above studies were based on extracurricular settings without equal gameplay time among participants. Other studies conducted in the context of language learning classes had different findings, such as Hitosugi et al. (2014), who found no significant effect for gender in three (pre-, post-, and delayed) vocabulary tests after five 50-minute L2 class sessions by using an off-the-shelf videogame. Similarly, no gender differences in DGBVL were found in Calvo-Ferrer and Belda-Medina’s (2021) study leveraging a 90-minute videogame or Hsu’s (2019) study on the use of 80-minute AR games in L2 classrooms. The current study provides further evidence that females also can benefit significantly from DGBVL in long-term formal settings with equal gameplay time. Conclusion This study aimed to better understand the potential value of DGBVL in helping EFL students to learn vocabulary. More specifically, it leveraged a quasi-experimental research design to investigate DGBVL’s long-term effects on RV and PV, within the educational context of the language classroom. The most important finding is that while participants in both experimental and control groups saw an increase in RV and PV by the end of 16 weeks of treatment, the vocabulary gains of the former group were significantly more than the latter. The current study does, however, have some limitations. First, to assure face validity, only a limited selection of target words was used for form-meaning link testing; however, word knowledge is multi- faceted and other types of word knowledge had not been examined to depict learners’ mastery of vocabulary more comprehensively in breadth and depth under longitudinal formal instruction. Second, the current study did not make use of a delayed posttest and therefore the retention rate of acquired RV and PV knowledge remains unknown. Third, the current study did not statistically compare individual differences among learners. There is some evidence, for example, that indicates that some factors, such as students’ favored game types, gender (Sundqvist & Wikström, 2015), and game frequency (Sundqvist, 2019) might produce various DGBVL effects, but the current study did not leverage such independent variables and compare corresponding differences. Therefore, while this study provides evidence that language learners may benefit from long-term DGBVL within the classroom, we conclude that further research is needed in order to inform language practitioners as to how to use DGBVL most effectively. First, a fuller understanding is needed of how DGBVL-based instruction may assist with other types of word knowledge, such as word parts, associations, collocations, and the automaticity or speed of word processing. Second, the question as to what is the ideal amount of time for leveraging DGBVL in classroom instruction deserves further investigation; this includes identifying an ideal duration of time for intervention, the ideal frequency of exposure to target words, and the most suitable interval time between delayed and posttests. Lastly, it may be prudent for researchers to shift their attention from the effects of DGBVL on vocabulary learning to how learning context (e.g., formal or informal; incidental or intentional learning) and individual differences (e.g., gender, cognitive styles) can support or undermine the efficacy of using digital game-based language learning. Acknowledgements This research was supported by Jiangsu Education Science 13th Five-Year Plan Fund (Project Ref. No. C- b/2020/01/10), the Fundamental Research Funds for the Central Universities (Project Ref. No. SKYC2020027 and SKYZ2020023), the Humanities & Social Sciences Funds for the College of Foreign Wen Jia, Liping Zhang, Austin Pack, Yi Guan, and Bin Zou 15 Studies in Nanjing Agricultural University (WY202203), and Jiangsu Research Association of Educational Technology for Higher Institutions (Project Ref. No.2019JSETKT065). We would like to thank the reviewers for their meticulous review of the manuscript and many constructive and insightful suggestions. Note 1. Table 1 is available at this following link: https://osf.io/wqg97/files/osfstorage/65fdc68fae6a24004e1c6e1e References Alfadil, M. (2020). Effectiveness of virtual reality game in foreign language vocabulary acquisition. Computers & Education, 153, Article 103893. https://doi.org/10.1016/j.compedu.2020.103893 Allen, L. K., Crossley, S. A., Snow, E. L., & McNamara, D. S. (2014). L2 writing practice: Game enjoyment as a key to engagement. Language Learning & Technology, 18(2), 124–150. http://dx.doi.org/10125/44373 Brezina, V. (2018). Statistics in corpus linguistics: A practical guide. Cambridge University Press. https://doi.org/10.1017/9781316410899 Calvo-Ferrer, J. R. (2017). Educational games as stand-alone learning tools and their motivational effect on L2 vocabulary acquisition and perceived learning gains. British Journal of Educational Technology, 48(2), 264–278. https://doi.org/10.1111/bjet.12387 Calvo-Ferrer, J. R., & Belda-Medina, J. (2021). The effect of multiplayer video games on incidental and intentional L2 vocabulary learning: The case of Among Us. Multimodal Technologies and Interaction, 5(12), 80. https://doi.org/10.3390/mti5120080 Cerezo, R., Calderón, V., & Romero, C. (2019). A holographic mobile-based application for practicing pronunciation of basic English vocabulary for Spanish speaking children. International Journal of Human-Computer Studies, 124, 13–25. https://doi.org/10.1016/j.ijhcs.2018.11.009 Chou, C., & Tsai, M.-J. (2007). Gender differences in Taiwan high school students’ computer game playing. Computers in Human Behavior, 23(1), 812–824. https://doi.org/10.1016/j.chb.2004.11.011 Chen, H. J.-H., & Hsu, H.-L. (2020). The impact of a serious game on vocabulary and content learning. Computer Assisted Language Learning, 33(7), 811–832. https://doi.org/10.1080/09588221.2019.1593197 Chen, H. J.-H., & Yang, C. T.-Y. (2013). The impact of adventure video games on foreign language learning and the perceptions of learners. Interactive Learning Environments, 21(2), 129–141. https://doi.org/10.1080/10494820.2012.705851 Cobb, T., & Horst, M. (2011). Does “word coach” coach words? CALICO Journal, 28(3), 639–661. https://doi.org/10.11139/cj.28.3.639-661 Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Routledge. https://doi.org/10.432 4/9780203771587 deHaan, J., Reed, W. M., & Kuwada, K. (2010). The effect of interactivity with a music video game on second language vocabulary recall. Language Learning & Technology, 14(2), 74–94. https://doi.org/ 10125/44215 Dindar, M., Ren, L., & Järvenoja, H. (2021). An experimental study on the effects of gamified cooperation and competition on English vocabulary learning. British Journal of Educational Technology, 52(1), 142–159. https://doi.org/10.1111/bjet.12977 16 Language Learning & Technology Dourda, K., Bratitsis, T., Griva, E., & Papadopoulou, P. (2014). Content and language integrated learning through an online game in primary school: A case study. Electronic Journal of e-Learning, 12(3), 243–258. Franciosi, S. J. (2017). The effect of computer game-based learning on FL vocabulary transferability. Educational Technology & Society, 20(1), 123–133. Gamst, G., Meyers, L. S., & Guarino, A. J. (2008). Analysis of variance designs: A conceptual and computational approach with SPSS and SAS. Cambridge University Press. https://doi.org/10.1017/CBO9780511801648 Hitosugi, C. I., Schmidt, M., & Hayashi, K. (2014). Digital game-based learning (DGBL) in the L2 classroom: The impact of the UN's off-the-shelf videogame, Food Force, on learner affect and vocabulary retention. CALICO Journal, 31(1), 19–39. https://doi.org/10.11139/cj.31.1.19-39 Hsu, T. C. (2019). Effects of gender and different augmented reality learning systems on English vocabulary learning of elementary school students. Universal Access in the Information Society, 18(2), 315–325. https://doi.org/10.1007/s10209-017-0593-1 Hirsh, D. (2015). Researching vocabulary. In B. Paltridge & A. Phakiti (Eds.), Research methods in applied linguistics: A practical resources (pp. 369–386). Bloombury Publishing. Hsu, T. -C. (2019). Effects of gender and different augmented reality learning systems on English vocabulary learning of elementary school students. Universal Access in the Information Society, 18(2), 315–325. https://doi.org/10.1007/s10209-017-0593-1 Hwang, W.-Y., Shih, T. K., Ma, Z.-H., Shadiev, R., & Chen, S.-Y. (2016). Evaluating listening and speaking skills in a mobile game-based learning environment with situational contexts. Computer Assisted Language Learning, 29(4), 639–657. https://doi.org/ 10.1080/09588221.2015.1016438 Larson-Hall, J. (2015). A guide to doing statistics in second language research using SPSS and R (2nd ed.). Routledge. https://doi.org/10.4324/9781315775661 Laufer, B., & Rozovski-Roitblat, B. (2011). Incidental vocabulary acquisition: The effects of task type, word occurrence and their combination. Language Teaching Research, 15(4), 391–411. https://doi.org/10.1177/1362168811412019 Lindstromberg, S. (2020). Intentional L2 vocabulary learning. In S. Webb (Ed.), The Routledge handbook of vocabulary studies (pp. 240–254). Routledge. https://doi.org/10.4324/9780429291586 Nation, P. (2001). Learning vocabulary in another language. Cambridge University Press. https://doi.org/10.1017/CBO9781139524759 Nation, P. (2020). The different aspects of vocabulary knowledge. In S. Webb (Ed.), The Routledge handbook of vocabulary studies (pp. 15–29). Routledge. https://doi.org/10.4324/9780429291586 Pack, A., & Newbould, S. (2018). Using game playability as a framework for professional development in language teaching. TESL Reporter, 51(1), 1–22. Park, J., Kim, S., Kim, A., & Yi, M. Y. (2019). Learning to be better at the game: Performance vs. completion contingent reward for game-based learning. Computers & Education, 139, 1–15. https://doi.org/10.1016/j.compedu.2019.04.016 Peng, W., Song, H., Kim, J., & Day, T. (2016). The influence of task demand and social categorization diversity on performance and enjoyment in a language learning game. Computers & Education, 95, 285–295. https://doi.org/10.1016/j.compedu.2016.01.004 Prensky, M. (2001). Fun, play and games: What makes games engaging. Digital Game-Based Learning, 5(1), 5–31. Wen Jia, Liping Zhang, Austin Pack, Yi Guan, and Bin Zou 17 Rama, P. S., Black, R. W., van Es, E., & Warschauer, M. (2012). Affordances for second language learning in World of Warcraft. ReCALL, 24(3), 322–338. https://doi.org/10.1017/s0958344012000171 Ranalli, J. (2008). Learning English with The Sims: Exploiting authentic computer simulation games for L2 learning. Computer Assisted Language Learning, 21(5), 441–455. https://doi.org/10.1080/09588220802447859 Rasti-Behbahani, A., & Shahbazi, M. (2020). Investigating the effectiveness of a digital game-based task on the acquisition of word knowledge. Computer Assisted Language Learning, 35(8), 1920–1945. https://doi.org/10.1080/09588221.2020.1846567 Read, J. (2020). Key issues in measuring vocabulary knowledge. In S. Webb (Ed.), The Routledge handbook of vocabulary studies (pp. 545–560). Routledge. https://doi.org/10.4324/9780429291586 Reinders, H. (Ed.). (2012). Digital games in language learning and teaching. Palgrave Macmillan. https://doi.org/10.1057/9781137005267 Reinhardt, J. (2019). Gameful second and foreign language teaching and learning: Theory, research, and practice. Palgrave Macmillan. https://doi.org/10.1007/978-3-030-04729-0 Reitz, L., Sohny, A., & Lochmann, G. (2016). VR-based gamification of communication training and oral examination in a second language. International Journal of Game-Based Learning, 6(2), 46–61. https://doi.org/ 10.4018/IJGBL.2016040104 Schmider, E., Ziegler, M., Danay, E., Beyer, L., & Bühner, M. (2010). Is it really robust? Reinvestigating the robustness of ANOVA against violations of the normal distribution assumption. Methodology: European Journal of Research Methods for the Behavioral and Social Sciences, 6(4), 147–151. https://doi.org/10.1027/1614-2241/a000016 Spada, N. (2005). Conditions and challenges in developing school-based SLA research programs. Modern Language Journal, 89(3), 328–338. https://doi.org/10.1111/j.1540-4781.2005.00308.x Sundqvist, P. (2019). Commercial-off-the-shelf games in the digital wild and L2 learner vocabulary. Language Learning & Technology, 23(1), 87–113. https://doi.org/10125/44674 Sundqvist, P., & Wikström, P. (2015). Out-of-school digital gameplay and in-school L2 English vocabulary outcomes. System, 51, 65–76. https://doi.org/10.1016/j.system.2015.04.001 Sylvén, L. K., & Sundqvist, P. (2012). Gaming as extramural English L2 learning and L2 proficiency among young learners. ReCALL, 24(3), 302–321. https://doi.org/10.1017/s095834401200016x Tseng, W.-T., Liou, H.-J., & Chu, H.-C. (2020). Vocabulary learning in virtual environments: Learner autonomy and collaboration. System, 88, Article 102190. https://doi.org/10.1016/j.system.2019.102190 Verma, J. (2015). Repeated measures design for empirical researchers. John Wiley & Sons. Webb, S. (2013). Depth of vocabulary knowledge. In C. Chapelle (Ed.), Encyclopedia of applied linguistics (pp. 1656–1663). Wiley-Blackwell. https://doi.org/10.1002/9781405198431.wbeal1325 Webb, S. (2020). Incidental vocabulary learning. In S. Webb (Ed.), The Routledge handbook of vocabulary studies (pp. 225–239). Routledge. https://doi.org/10.4324/9780429291586 Woodrow, L. (2014). Writing about quantitative research in applied linguistics. Palgrave Macmillan. https://doi.org/ 10.1057/9780230369955 Wu, T.-T. (2018). Improving the effectiveness of English vocabulary review by integrating ARCS with mobile game-based learning. Journal of Computer Assisted Learning, 34(3), 315–323. https://doi.org/10.1111/jcal.12244 18 Language Learning & Technology Yang, Q.-F., Chang, S.-C., Hwang, G.-J., & Zou, D. (2020). Balancing cognitive complexity and gaming level: Effects of a cognitive complexity-based competition game on EFL students’ English vocabulary learning performance, anxiety and behaviors. Computers & Education, 148, Article 103808. https://doi.org/10.1016/j.compedu.2020.103808 Yang, W., & Dai, W. (2011). Rote memorization of vocabulary and vocabulary development. English Language Teaching, 4(4), 61–64. https://doi.org/10.5539/elt.v4n4p61 You, J. C., & Dörnyei, Z. (2016). Language learning motivation in China: Results of a large-scale stratified survey. Applied Linguistics, 37(4), 495–519. https://doi.org/10.1093/applin/amu046 Zou, D., Huang, Y., & Xie, H. (2019). Digital game-based vocabulary learning: Where are we and where are we going? Computer Assisted Language Learning, 34(5–6), 751–777. https://doi.org/ 10.1080/09588221.2019.1640745 Wen Jia, Liping Zhang, Austin Pack, Yi Guan, and Bin Zou 19 Appendix A. Vocabulary Test PartⅠ 请写出划线单词的汉语意思。 1. It's tedious work, but Mike seems to truly love it. 2. Showing the soles of shoes to someone is a sign of contempt in Arab culture. 3. Nobody will be cheerful on such a cheerless rainy day. 4. At Shanxi province, residents have said they almost choke on coal dust in the evenings. 5. The swimming pool is available only in summer. 6. The government monitored correspondence and telephone conversations. 7. The spectacular sunrise made us exclaim in surprise. 8. They set off in the car and bumped over a sandy track to Sam’s farm. 9. Can you give me the highlight of your resume? 10. At the time, I was somewhat of a workaholic, clocking in sixty-hour weeks. 11. A study in 2002 showed that athletics help student academic performance in high school more than any other extracurricular activity. 12. Tom Chaney raised his rifle and shot him in the forehead, killing him instantly. 13. You can't do all the jobs yourself—you can delegate a task to your subordinate. 14. By the week's end, an extraordinary intimacy had grown up between Tom and his new friend. 15. Schedule a convenient time, at least once a week, to spend the whole day with your child. 16. I was disappointed by his indifference more than somewhat. 17. At the time, Iraq was seen as the biggest foe, followed by China and Iran. 18. He spent his adolescent years playing guitar in the church band. 19. The teacher drilled grammar and the multiplication tables every day. 20. The solar panel calculator is now as cheap as a pack of cigarettes. Part Ⅱ 请根据汉语释义和首字母提示拼写出该单词。 1. She's too p to enjoy rude jokes! (古板的) 2. Can you c music on the computer? (创作) 3. He has just come off a difficult a . (任务,作业) 4. They’ve decided to p having a family for a while. (推迟) 5. I e it will take three months to build the bridge .(估计) 6. This provides the extra motivation you need for those t days. (困难的,艰难的) 7. It's been dry for so long that the forest could burst into f at any moment. (⽕焰) 8. He greets me at the door with his signature boyish g and a hug. (咧嘴笑) 9. The e of bombs roused me out of a deep sleep. (爆炸) 10. To support himself economically, he did a lot of o jobs, including shining shoes, washing dishes, and all that. (临时的,不固定的) 11. They were my friends, and I just didn’t want to e them publicly. (使尴尬, 使困窘) 12. My secretary leaves us next week, so we are advertising for a r . (接替者) 13. A s of 100 winter-swimmers in different age groups indicates that 80 percent originally suffered from diseases of some kind. (调查) 14. He is a r person to trust. (可信赖的,可靠的) 20 Language Learning & Technology 15. He is a p writer, creating many great works in his life. (多产的) 16. The father told the son a f about a puppet. (寓⾔) 17. They expected me to lie down like a c . (懦夫) 18. Let us p to next step. (继续进⾏) 19. These factories c that they are not responsible for the pollution of the river. (声称) 20. The home is situated within easy a____________ of shops and other facilities. (接近,进入) Wen Jia, Liping Zhang, Austin Pack, Yi Guan, and Bin Zou 21 About the Authors Wen Jia is an associate professor working at the College of Foreign Studies at Nanjing Agricultural University, China. She is also a Ph.D. candidate studying in the Department of Applied Linguistics at Xi’an Jiaotong- Liverpool University. Her research interests include second language acquisition, game-based learning, and technology-enhanced language learning. E-mail: jiawen@njau.edu.cn Liping Zhang is a professor working at the Department of Basic Courses in Army Engineering University of PLA. Her research interests include second language acquisition, English for Specific Purposes, and corpus linguistics. E-mail: zlpworkgroup@163.com Austin Pack is an assistant professor working at the Faculty of Education & Social Work at Brigham Young University-Hawaii. His research interests include the psychology of language learning, complex dynamic systems, network analysis, computer assisted language learning, and virtual reality technologies. E-mail: austin.pack@byuh.edu Yi Guan currently is an instructor at Global Institute of Software Technology in Suzhou, China. Her interests include mobile-assisted language learning and English vocabulary learning. E-mail: guanyi95@sina.cn Bin Zou is an associate professor at the Department of Applied Linguistics, Xi’an Jiaotong-Liverpool University, China. He received his Ph.D. in TESOL and computer technology from the University of Bristol, UK. He is the Founding Editor and Co-Editor-in-Chief of the International Journal of Computer-Assisted Language Learning and Teaching. Bin Zou is the corresponding author of this article. E-mail: bin.zou@xjtlu.edu.cn