Discovery-based Vocabulary Learning: Comparing English Learners from China and Finland

Simon Smith PhD, FHEA

Senior Lecturer, Academic English

Discovery-based Vocabulary Learning: Comparing English Learners from China and Finland


International students of Business or Finance, according to many subject tutors, often have a good command of general English vocabulary, but are often stumped by the specialist terminology of their field. The traditional approach to teaching this terminology has been through the use of wordlists, such as the glossaries typically found at the back of specialist textbooks. Learning from a list of words, however, can be dull or demotivating for the students.

This study considers a novel approach to the learning of subject-specific vocabulary by international students. Instead of trying to memorize wordlists or glossaries, the students themselves create the wordlists, from authentic course resources.

I describe a comparative study of the approach when applied to two cohorts of academic English students: Finnish students in Finland, and Chinese students at Coventry. Some interesting differences between the perceptions of members of the two cohorts are presented.


Sincere thanks to Dr Nicole Keng of University of Vaasa for collaborating with me on the study, and to FAH for funding my attendance at the American Association of Applied Linguistics, Portland, OR in March, where a version of the study will be presented at a colloquium organized by Nina Vyatkina, alongside five other data-driven learning contributions.

Short paper

The idea of supporting language learning with the use of corpora has been around since 1991, when Tim Johns coined the term data-driven learning (DDL). The corpus approach invites learners to tease out patterns from authentic text, and test their own linguistic hypotheses in the manner of a mini research project; it has an intuitive appeal to teachers who favour student-centred or inductive learning.

Corpus construction by learners: prior work

It has been claimed (Tyne 2009; Charles 2014) that the process of creating a corpus inculcates a sense of ownership in the learner and therefore has a motivational impetus. This is especially true, it is claimed here, when the topic of the corpus is of personal interest to the learner, or coincides with their major field of study. Once the corpus is constructed, some students may be sufficiently motivated to consult it and add to it when needed. Moreover, the process of compiling the corpus may lead to the acquisition of not only language, but also useful transferable skills, including IT and problem-solving competencies.

This study

 Two cohorts of academic English students were give the opportunity to build their subject-specific vocabulary by constructing DIY corpora based on documents from their own study materials. A Finnish cohort consisted of 74 UG students majoring in Business, from all year levels, at CEFR B2-C1. The UK cohort was comprised of 94 international top-up (Year 3) students, of whom 88 were Chinese, studying Accounting & Finance, around B2 proficiency. They explored the corpora using a variety of Sketch Engine tools, generated wordlists of technical terms appearing in the corpora, and created vocabulary portfolios based on the wordlists in the form of Excel spreadsheets. Figure 1 shows the process by which students derived the resources.

Figure 1 How students created the resources

Corpus construction and consultation

The corpora were seeded from lecture PowerPoints, seminar discussion notes and other materials provided by subject tutors. Figure 2 shows a typical lecture PowerPoint.

Conveniently, Sketch Engine can be used to upload text from a variety of document types to create and add to a corpus. Standardly, in the HE sector, students are given online access to teaching and learning materials via a Virtual Learning Environment (VLE), and the institutions in Coventry and Vaasa use Moodle for this purpose. As each new week’s lecture slides and seminar notes were made available, the students would add in the new content and grow their corpus.

Figure 2 Management Accounting lecture slides, including abbreviations and specialist terms, which make useful keywords

The procedure for constructing a corpus (and consulting it) is shown in Figure 3. First, the user uploads the text content of teaching materials to form a mini-corpus, using the Sketch Engine Corpus Architect. Because of the nature of lecture slides, the resulting corpus does not contain many full sentences, but it will include the key vocabulary for the particular topic. Students could opt to create a more specialized corpus, consisting of perhaps just one PowerPoint, for example on “Capital Investment Appraisal”, to which two lectures were devoted. Alternatively they might decide to create a whole-module corpus, such as “Management Accounting”.

The Sketch Engine software is then used to generate a list of the most salient words in the corpus (words found frequently in the corpus, which are not found in a reference corpus). Thus, the word the is not salient, because it is found with equal frequency in both specialist and reference corpora.

The corpus is then available to be used by students in the following ways

  1. To produce lists of subject area words and terms for study
  2. To view word sketches, which give a one-page view of the collocations and grammatical structures in which a word or term participates.
  3. To view the words and terms in context, using concordancing
  4. To link back to the original texts on the web.

Vocabulary portfolios

 Students were also asked to create and work with personal vocabulary portfolios. For this purpose, lecture topics (such as “Capital Investment Appraisal”) were selected by the instructor/researcher, and corpora were prepared in advance. A list of the most salient words in the domain was generated by the instructor, and students would transfer their choice of domain terms into their portfolio, which took the form of an Excel spreadsheet. Columns in the spreadsheet could be used to insert a financial dictionary definition, a general language definition, examples of term use from the BAWE corpus, and an L1 translation if desired. Links to resources were provided within the template Excel file, for ease of use, and students were encouraged to keep the portfolios up to date on a weekly basis. Figure 3 shows part of one student’s vocabulary profile.

Figure 3 Student’s vocabulary portfolio excerpt


 Students provided feedback in the form of reflective reports. They appeared to enjoy the approach, commenting that “Using corpora is effortless and fun way to learn vocabulary” (Finnish cohort student) and “The process of create my own corpora was very enjoyable and makes me sense of accomplishment” (UK cohort student). The Chinese students, in particular, appreciated the relevance of the approach to learning subject-specific vocabulary, commenting “Creating a specialized corpus could be useful when it comes to researching a particular subject or learning a subject in English” and “I thought that the Sketch Engine was useful software not only for my English study but also for Accounting & Finance study”.

Differences between the two cohorts may have reflected preferences among Chinese and Finnish students for distinct learning styles. Finnish students seemed to be drawn to the exploratory approach, and to demonstrate awareness of collocation, with one noting “I learnt that the word profit as object can be used in a sentence like you can make a profit from services. It can be used with preposition for example in the sentence like he sold a car at a profit”. Chinese students favoured memorization, with one commenting “I like to remember words in a sentence. I think making word portfolio is good […] If it is used as a resource and read it again and again, it can be very useful”.

Summary and future plans

In the near future, we will conduct a quantitative comparative study, measuring the effectiveness of our approach and and comparing improvement in specialist vocabulary knowledge, as well as investigating continuing use by students, noting Charles (2014: 39) observation that “Ongoing use also indicates substantial commitment to the personal corpus”. We believe that our discovery-based approach to vocabulary learning promotes learner-centredness and task ownership, raise learner awareness of relationship between learner progress and exploration in language learning. It successfully integrates language and transferable skills, and in the words of one student: “Using corpora in the classroom is FUN!”


Charles, M. (2014) ‘Getting the corpus habit: EAP students’ long-term use of personal corpora’. English for Specific Purposes 35, 30–40

Johns, T. F. (1991) ‘Should you be persuaded: Two examples of data-driven learning’. in Classroom concordancing. ed. by Johns, T. F. and King, P. Birmingham: ELR, 1–13

Tyne, H. (2009) ‘Corpus oraux par et pour l’apprenant [Spoken corpora by and for the learner]’. in Des documents authentiques oraux aux corpus: Questions d’apprentissage en didactique des langues. ed. by Boulton, A. Nancy: Mélanges CRAPEL, 91–111


Leave a Reply

Your email address will not be published. Required fields are marked *