ARABIC CORPUS COMPILATION

The first task in Arabic lexicography is corpus compilation.

The following is a list of publicly available Arabic corpora, sorted chronologically. We have added some notes based on our experience in processing these corpora.

Corpora can also be obtained by data-mining websites directly. The following is a partial listing of Arabic newspapers that have shown a steady sizable output over the years:

Once a corpus is compiled, the next task is to assess its size in terms of types and tokens (see WORD FREQUENCY COUNTS).


HOME | CORPUS COMPILATION | WORD FREQUENCY COUNTS | CONCORDANCING | MORPHOLOGY ANALYSIS | ARABIC LEXICON

Copyright © 2002-2022 QAMUS LLC