Slovenian National Corpus

Slovenian National Corpus FidaPLUS is the 621 million words (tokens) corpus of the Slovenian language, gathered from selected texts written in Slovenian of different genres and styles, mainly from books and newspapers.^[1]

The FidaPLUS database is an upgrade of the older (FIDA) corpus, which was developed between 1997 and 2000, with added texts that were published up to 2006 and was the result of the applicative research project of the Faculty of Arts, Faculty of Social Sciences, both University of Ljubljana, and Jožef Stefan Institute's Department of Knowledge Technologies.^[2]

Corpus is available via a corpus manager Sketch Engine.^[3] This version FidaPLUS corpus contains Word sketches, an automatic corpus-derived overview of word's grammatical and collocational behaviour.

Year of publication	Number of words	Percent
1979 - 1990	262.708	0.04%
1991	1.487.895	0.24%
1992	2.256.692	0.36%
1993	3.208.687	0.52%
1994	7.534.689	1.21%
1995	7.433.897	1.2%
1996	16.913.916	2.27%
1997	31.589.250	5.09%
1998	43.512.041	7.01%
1999	54.711.630	8.81%
2000	57.677.534	9.29%
2001	74.720.532	12.03%
2002	72.802.484	11.72%
2003	82.897.097	13.35%
2004	67.041.167	10.79%
2005	39.086.695	6.29%
2006	44.526.825	7.17%
N/A	13.486.261	2,17%

References

↑ The FidaPLUS number of words by date of publication
↑ The FidaPLUS team list and institutional affiliations
↑ FidaPLUS corpus in Sketch Engine

External links

Slovenian National Corpus website FidaPLUS

Corpus linguistics

Text corpora, English	American National Corpus Bank of English Bergen Corpus of London Teenage Language British National Corpus Brown Corpus Buckeye Corpus Cambridge English Corpus Corpus of Contemporary American English Enron Corpus International Corpus of English Lancaster-Oslo-Bergen Corpus Oxford English Corpus PropBank Spoken English Corpus TIMIT VerbNet Wellington Corpus of Spoken New Zealand English

Text corpora, non-English	Bijankhan Corpus CHILDES Croatian Language Corpus Croatian National Corpus Europarl corpus German Reference Corpus Hamshahri Corpus National Corpus of Polish Neo-Assyrian Text Corpus Project Quranic Arabic Corpus Russian National Corpus Scottish Corpus of Texts and Speech Slovenian National Corpus TalkBank Tatoeba Tehran Monolingual Corpus Tekstaro de Esperanto Thesaurus Linguae Graecae

Organizations	BNC consortium COBUILD

This article is issued from Wikipedia - version of the 11/22/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.