本文へスキップ

KOJIMA Masumi's Website

Measure of Lexical Richness (MLR; Vermeer, 2004)

Introduction

Vermeer (2004) tries to estimate the productive vocabulary size of a learner from his/her language production, and proposes the Measure of Lexical Richness (MLR). To calculate the MLR, the relative distribution of the token coverage in the Schrooten and Vermeer (1994) corpus was taken as a ‘model’. This corpus contains nearly two million Dutch words (tokens) collected from the oral as well as written language productions of children in primary schools. The corpus yields a total of 26,000 lemmas. She distinguishes nine categories of frequency classes in the word list created from the corpus. If the relative distribution over the nine lists of the words in an analysed text is the same as those in the model, then the MLR score is considered to match with the vocabulary of about 26,000 words.

How MLR works

The MLR score is calculated by adding up each quotient of the text coverage rate of nine voclists in proportion to the model coverage of the corpus. Each quotient is multiplied by the number of lemmas in each voclist, and is divided by 1000. Voclist 2 and higher have a weighted multiplication factor in the denominator. Vermeer uses these weights because most of the texts that she investigated did not have two million tokens, but only about 1000 tokens. She explains this saying, ‘a huge corpus has relatively more hapaxes, and relatively higher coverage percentages in the lower frequency ranges’ (Vermeer, 2004: 181). Table 1 exemplifies the calculation of the MLR.

In the case presented in Table 1, there were 971 tokens in the speech data of a child, of which 41 were not in the lists (e.g. particular names of children); 832 of 930 tokens were found in the first voclist, and the text coverage rate of the list was 89.5%. This coverage rate was divided by 85.3 (the model coverage rate), multiplied by 1000 (the number of lemmas in the list), and divided by 1000. The score for voclist 1 was 1.00. The MLR score was calculated by adding up the scores for the nine voclists, resulting in 4.65. In the case presented in Table 1, the MLR score of 4.65 indicates that this child was supposed to have a productive vocabulary size of 4650.

Application of the MLR

To validate the MLR, Vermeer (2004) gathered spontaneous speech data of 16 native Dutch children and 16 ethnic minority children with Dutch as a second language, and analysed them with the MLR. The children’s MLR scores were compared with their scores on a receptive vocabulary task and a definition task, and with various type/token-based measures. The results show that the MLR differentiated between the two groups with obvious differences in vocabulary, correlated significantly with the vocabulary tasks administered to the same children, and was independent of syntactic abilities and text length.

Vermeer (2004) does not discuss how she decides the weight of each model coverage rate in the MLR formula (see Table 1). Van Hout and Vermeer (2007: 108) simply state that ‘this formula is explainable, but on the other hand far from elegant. For the time being, we are only interested in the power of the frequency approach in making calculations of lexical richness more reliable and useful.’ These weights in the formula were calculated based on 2 million words in a Dutch corpus by applying them to a text with 1000 tokens. We do not know how to adapt the MLR measure to English written data in which each text contains only a few hundred tokens. Vermeer’s (2004) idea of estimating the productive vocabulary size from a language production is unique; however, it is difficult to adapt this measure to different settings.

References

  • Schrooten, W., & Vermeer, A. (1994). Woorden in het Basisonderwijs. 15.000 woorden aangeboden aan leerlingen. Studies in Meertaligheid. Tilburg, The Netherland: Tilburg University Press.
  • Van Hout, R., & Vermeer, A. (2007). Comparing measures of lexical richness. In H. Daller, J. Milton, & J. Treffers-Daller (Eds.), Modelling and assessing vocabulary knowledge (pp.93-115). Cambridge University Press.
  • Vermeer, A. (2004). The relation between lexical richness and vocabulary size in Dutch L1 and L2 children. In P. Bogaards, & B. Laufer (Eds.), Vocabulary in a second language: Selection, acquisition and testing (pp. 173-189). Amsterdam and Philadelphia, PA: John Benjamins.