The Academy has been a pioneer in Basque corpus linguistics, having begun this work in the 1980s. It first developed the diachronic corpus of the General Basque Dictionary and the Statistical Corpus of 20th-Century Basque. Later, with the arrival of the 21st century, it established the Lexical Observatory, a monitor corpus. Without these corpora, the Academy would not have been able, for instance, to provide the entries, meanings, usage marks, and contextual examples included in its normative dictionary, all based on real Basque publications.
In recent years, however, the notion of referentiality has been somewhat forgotten, and many researchers have focused instead on creating the largest possible corpora, concentrating almost exclusively on textual volume. Yet both types of corpora, the large general corpora and the reference corpus, are necessary even though they serve different purposes.
Why, then, a reference corpus? And why now?
To keep track of contemporary Basque usage and to meet the new challenges brought by language technologies, the Academy recognised the need for a more balanced corpus. But not only that, it was also concerned with quality, given today’s emphasis on corpus size over content. The results produced by a corpus depend on the quality of the materials it contains, and linguistic quality is a fundamental criterion in building the reference corpus.
The first version includes 123,124 Basque-language documents produced between 2000 and the present, totalling 154.21 million words and 129,817 distinct lemmas. It is a dynamic corpus, updated annually and always covering the most recent 25 years, because the lexicon, like society itself, is constantly changing.
The Basque Reference Corpus is a collective project: all its content has been obtained thanks to the generosity of publishers, institutions, and media outlets. The Academy has transformed these materials into a corpus and returned them to society. It therefore belongs to everyone. It is also a respectful corpus, since agreements have been signed with each contributor, always acknowledging the authors’ ownership. Thanks to everyone’s collaboration, the corpus is available at https://eec.euskaltzaindia.eus/.

