本文へスキップ

KOJIMA Masumi's Website

S (Kojima & Yamashita, 2014)

Introduction

S is a lexical richness measure that uses the word frequency lists developed by Kojima (2011). It is specifically designed to assess the lexical richness of texts used in second language teaching and testing. In this respect, it bears a resemblance to the Lexical Frequency Profile (Laufer and Nation,1995), P_Lex (Meara and Bell, 2001) and Advanced TTR and Advanced Guiraud (Daller et al., 2003). However, Kojima and Yamashita (under review) have examined the text-length dependency and reliability of these measures, and their results suggest that S is the most robust with short L2 texts, as compared to the major alternatives.

Program for calculating S values

How S works

S represents the word-frequency level where text coverage is expected to reach 100%, as estimated by the text coverage ratios across different frequency ranks. The practical procedure for estimating the value of S is described
below (Kojima and Yamashita, under review).

  1. Sample 50 successive words, beginning with the first word in a text.
  2. Estimate the cumulative text coverage rates at six levels: the most frequent 500, 1000, 1500, 2000, 2500, and 3000 words.
  3. Sample 50 successive words again, beginning with the second word in the text, and repeat step 2.
  4. In this way, the first and last of the 50 successive words shift by one each time until the last word of the text is chosen as the first word or ‘headword’ of the sample. When are not enough words following the
    headword (that is, toward the end of the text), the number of words required to make up the shortfall is taken from the beginning.
  5. An empirical curve of the cumulative text coverage ratio is produced from the data, and the computer program adjusts the value of S to find the best fit between the empirical curve and the theoretical curves calculated by the model.

Word lists

The word lists used for S are based on the spoken section of the British National Corpus (BNC). They were developed by Nation and are available on his website ( http://www.victoria.ac.nz/lals/staff/paul-nation.aspx/ ). The original twelve lists consist of high frequency words, each containing 1000 word families. To estimate S, only the first three lists are used, because the text coverage rate of the first 3000 words from the BNC is around 90% across various texts (Nation, 2004, 2006) and the occurrence of lower-frequency words is closely related to the topic or the subject of the text (Nation and Waring, 1997). In order to calculate S, Kojima (2011) investigated the frequency rank of each word family from the lists in the spoken section of the BNC. Here are some examples of word families from one of the lists.

Advantages of S

S has several methodological advantages compared to other word-list–based measures of lexical richness (Kojima and Yamashita, under review). First, S sums up the complex distribution of words used by a given learner across different frequency ranks, yielding a single value. Other advantages of S include that its scores are intuitively easy to interpret: for instance, a score of 3429 implies that the text coverage rate is expected to reach 100% at the frequency rank of 3429. In other words, S represents the overall frequency level of all the words in a text.

References

  • Daller, H., van Hout, R., & Treffers-Daller, J. (2003). Lexical richness in spontaneous speech of bilinguals. Applied Linguistics, 24 (2), 197-222.
  • Kojima, M. (2011). An argument-based approach to validate S: A newly developed measure of lexical richness. Poster presented at the Corpus Linguistics 2011, Birmingham.
  • Kojima, M., & Yamashita, J. (2014). Reliability of lexical richness measures based on word lists in short second language productions. System: An International Journal of Educational Technology and Applied Linguistics, 42, 23-33.
  • Laufer, B., & Nation, P. (1995). Vocabulary Size and Use: Lexical Richness in L2 Written Production. Applied Linguistics, 16 (3), 307-322.
  • Meara, P., & Bell, H. (2001). P_Lex: A simple and effective way of describing the lexical characteristics of short L2 texts. Prospect, 16 (3), 5-19.
  • Nation, I. S. P. (2004). A study of the most frequent word families in the British National Corpus. In P. Bogaards, & B. Laufer (Eds.), Vocabulary in a second language: Selection, acquisition and testing (pp. 3-13). Amsterdam and Philadelphia: John Benjamins.
  • Nation, I. S. P. (2006). How large a vocabulary is needed for reading and istening? Canadian Modern Language Review, 63 (1), 59-82.
  • Nation, I. S. P., & Waring, R. (1997). Vocabulary size, text coverage and word lists. In N. Schmitt, & M. J. McCarthy (Eds.), Vocabulary: Description, acquisition and pedagogy (pp. 6-19). Cambridge University Press.