[Ilugc] Tamil Corpus
- From: jaganadhg@xxxxxxxxx (JAGANADH G)
- Date: Fri, 6 Jan 2012 20:08:19 +0530
Pardon my ignorance. What do you mean by language model ?
A language model is a statistical model which populate from a data set.
Here I think OP is taling about creating language model for Speech
Processing. N-Gram is a kind of language model
http://en.wikipedia.org/wiki/N-gram
And by
Tamil-corpus do you mean a large collection of tamil text ?
Corpus in the context of Natural Language Processing is:
A large collection of text .
There are different types of corpus such as Text Corpus, Speech Corpus,
Image corpus etc..
Here OP requires a text corpus. I think he can use the Tamil Wikipedia dump
as corpus for his research purpose. Or he can populate a corpus from
newspaper RSS feeds and Tamil blog feeds too.
--
**********************************
JAGANADH G
http://jaganadhg.in
*ILUGCBE*
http://ilugcbe.org.in
Other related posts: