Introduction
S is a lexical richness measure that uses the word frequency lists developed by Kojima (2011). It is specifically designed to assess the lexical richness of texts used in second language teaching and testing. In this respect, it bears a resemblance to the Lexical Frequency Profile (Laufer and Nation,1995), P_Lex (Meara and Bell, 2001) and Advanced TTR and Advanced Guiraud (Daller et al., 2003). However, Kojima and Yamashita (under review) have examined the text-length dependency and reliability of these measures, and their results suggest that S is the most robust with short L2 texts, as compared to the major alternatives.
Program for calculating S values
How S works
S represents the word-frequency level where text coverage is expected to
reach 100%, as estimated by the text coverage ratios across different frequency
ranks. The practical procedure for estimating the value of S is described
below (Kojima and Yamashita, under review).
- Sample 50 successive words, beginning with the first word in a text.
- Estimate the cumulative text coverage rates at six levels: the most frequent
500, 1000, 1500, 2000, 2500, and 3000 words.
- Sample 50 successive words again, beginning with the second word in the
text, and repeat step 2.
- In this way, the first and last of the 50 successive words shift by one
each time until the last word of the text is chosen as the first word or
‘headword’ of the sample. When are not enough words following the
headword (that is, toward the end of the text), the number of words required
to make up the shortfall is taken from the beginning.
- An empirical curve of the cumulative text coverage ratio is produced from
the data, and the computer program adjusts the value of S to find the best
fit between the empirical curve and the theoretical curves calculated by
the model.
Word lists
The word lists used for S are based on the spoken section of the British
National Corpus (BNC). They were developed by Nation and are available
on his website ( http://www.victoria.ac.nz/lals/staff/paul-nation.aspx/ ). The original twelve lists consist of high frequency words, each containing
1000 word families. To estimate S, only the first three lists are used,
because the text coverage rate of the first 3000 words from the BNC is
around 90% across various texts (Nation, 2004, 2006) and the occurrence
of lower-frequency words is closely related to the topic or the subject
of the text (Nation and Waring, 1997). In order to calculate S, Kojima
(2011) investigated the frequency rank of each word family from the lists
in the spoken section of the BNC. Here are some examples of word families
from one of the lists.
Advantages of S
S has several methodological advantages compared to other word-list–based
measures of lexical richness (Kojima and Yamashita, under review). First,
S sums up the complex distribution of words used by a given learner across
different frequency ranks, yielding a single value. Other advantages of
S include that its scores are intuitively easy to interpret: for instance,
a score of 3429 implies that the text coverage rate is expected to reach
100% at the frequency rank of 3429. In other words, S represents the overall
frequency level of all the words in a text.
References
- Daller, H., van Hout, R., & Treffers-Daller, J. (2003). Lexical richness
in spontaneous speech of bilinguals. Applied Linguistics, 24 (2), 197-222.
- Kojima, M. (2011). An argument-based approach to validate S: A newly developed
measure of lexical richness. Poster presented at the Corpus Linguistics
2011, Birmingham.
- Kojima, M., & Yamashita, J. (2014). Reliability of lexical richness
measures based on word lists in short second language productions. System: An International Journal of Educational Technology and Applied Linguistics, 42, 23-33.
- Laufer, B., & Nation, P. (1995). Vocabulary Size and Use: Lexical Richness
in L2 Written Production. Applied Linguistics, 16 (3), 307-322.
- Meara, P., & Bell, H. (2001). P_Lex: A simple and effective way of
describing the lexical characteristics of short L2 texts. Prospect, 16 (3), 5-19.
- Nation, I. S. P. (2004). A study of the most frequent word families in
the British National Corpus. In P. Bogaards, & B. Laufer (Eds.), Vocabulary in a second language: Selection, acquisition and testing (pp. 3-13). Amsterdam and Philadelphia: John Benjamins.
- Nation, I. S. P. (2006). How large a vocabulary is needed for reading and
istening? Canadian Modern Language Review, 63 (1), 59-82.
- Nation, I. S. P., & Waring, R. (1997). Vocabulary size, text coverage
and word lists. In N. Schmitt, & M. J. McCarthy (Eds.), Vocabulary: Description, acquisition and pedagogy (pp. 6-19). Cambridge University Press.