Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How I can start building wordnet for Turkish language to use in sentiment analysis

Although I hold EE background, I didn't get chance to attend Natural Language processing classes.

I would like to build sentiment analysis tool for Turkish language. I think it is best to create a Turkish wordnet database rather than translating the text to English and analyze it with buggy translated text with provided tools. (is it?)

So what do you guys recommend me to do ? First of all taking NLP classes from an open class website? I really don't know where to start. Could you help me and maybe provide me step by step guide? I know this is an academic project but I am interested to build skills as a hobby in that area.

Thanks in advance.

like image 855
met.in Avatar asked Dec 27 '11 05:12

met.in


1 Answers

Here is the process I have used before (making Japanese, Chinese, German and Arabic semantic networks):

  1. Gather at least two English/Turkish dictionaries. They must be independent, not derived from each other. You can use Wikipedia to auto-generate one of your dictionaries. If you need to publish your network, then you may need open source dictionaries, or license fees, or a lawyer.
  2. Use those dictionaries to translate English Wordnet, producing a confidence rating for each synset.
  3. Keep those with strong confidence, manually approving or fixing through those with medium or low confidence.
  4. Finish it off manually

I expanded on this in the "Automatic Translation Of WordNet" section of my 2008 paper: http://dcook.org/mlsn/about/papers/nlp2008.MLSN_A_Multilingual_Semantic_Network.pdf

(For your stated goal of a Turkish sentiment dictionary, there are other approaches, not involving a semantic network. E.g. "Semantic Analysis and Opinion Mining", by Bing Liu, is a good round-up of research. But a semantic network approach will, IMHO, always give better results in the long run, and has so many other uses.)

like image 116
Darren Cook Avatar answered Nov 16 '22 16:11

Darren Cook