The principles of the project and its outcomes are summarised in the following paper:
Kilgarriff A, Charalabopoulou F, Gavrilidou M, Johannessen JB, Khalil S, Johansson Kokkinakis S, Lew R, Sharoff S, Vadlapudi R, Volodina E. Corpus-based vocabulary lists for language learners for nine languages. Language resources and evaluation. 2014; 48:121-63.
The University of Leeds worked on the lists for three languages: Arabic, Chinese and Russian, while other partners worked on English, Greek, Italian, Norwegian, Polish and Swedish.
The corpora for our languages were collected as a large snapshot of texts available for these languages on the Web, using technologies discussed in:
Serge Sharoff. Creating general-purpose corpora using automated search engine queries. In Marco Baroni and Silvia Bernardini, editors, WaCky! Working papers on the Web as Corpus, Gedit, Bologna, 2006.
In addition to the frequency list the dictionary also includes illustrative examples, their translations into English, patterns of stress, as well as a list of the most common multiword expressions in Russian. See the introduction to the dictionary for more information.
There is also a database interface, which can be used to explore the links between words selected for each of these languages. It gives you an idea how many basic meanings a word has in each language, and how the meanings vary between the languages. For example,
The word tie is ambiguous in English, but the words γραβάτα, krawat, галстук in respectively Greek, Polish and Russian have only one basic meaning (translations shown in red are symmetrical to the source word shown in bold).