A Welsh corpus should include as much of the contemporary language as possible - extracts from contemporary novels, newspaper articles, academic journals, political speeches, sermons, almost any example, in fact, of the current state of the language. A corpus of this kind allows the production of dictionaries, grammars and teaching materials based on real, recent linguistic data, as opposed to possibly subjective, out-dated or inaccurate notions of correctness. Since most linguistic analysis is done on phrases or sentences, I would like to suggest the creation of a database of Welsh phrases, a Phrase Bank or Phrase Library or whatever. Here's a very small example of phrases gleaned from various sites on the Web and sorted in alphabetic order:
One of the things it would be nice to do with the Phrase Bank is to create a 'concordance' - effectively a dictionary with examples but no definitions (and then merge it with a set of definitions).
Concordance
Word Frequency List - a list of the 10,000 or so most common word-forms in the Phrase Bank.
Last Modified: