Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Techniques for calculating adjective frequency [closed]

I need to calculate word frequencies of a given set of adjectives in a large set of customer support reviews. However I don't want to include those that are negated.

For example suppose my list of adjectives was: [helpful, knowledgeable, friendly]. I want to make sure "friendly" isn't counted in a sentence such as "The representative was not very friendly."

Do I need to do a full NLP parse of the text or is there an easier approach? I don't need super high accuracy.

I'm not at all familiar with NLP. I'm hoping for something that doesn't have such a steep learning curve and isn't so processor intensive.

Thanks

like image 356
awinbra Avatar asked Oct 09 '22 00:10

awinbra


2 Answers

If all you want is adjective frequencies, then the problem is relatively simple, as opposed to some brutal, not-so-good machine learning solution.

Wat do?

Do POS tagging on your text. This annotates your text with part of speech tags, so you'll have 95% accuracy or more on that. You can tag your text using the Stanford Parser online to get a feel for it. The parser actually also gives you the grammatical structure, but you only care about the tagging.

You also want to make sure the sentences are broken up properly. For this you need a sentence breaker. That's included with software like the Stanford parser.

Then just break up the sentences, tag them, and count all things with the tag ADJ or whatever tag they use. If the tags don't make sense, look up the Penn Treebank tagset (Treebanks are used to train NLP tools, and the Penn Treebank tags are the common ones).

How?

Java or Python is the language of NLP tools. Python, use NLTK. It's easy, well documented and well understood.

For Java, you have GATE, LingPipe and the Stanford Parser among others. It's a complete pain in the ass to use the Stanford Parser, fortunately I've suffered so you do not have to if you choose to go that route. See my google page for some code (at the bottom of the page) examples with the Stanford Parser.

Das all?

Nah, you might want to stem the adjectives too- that's where you get the root form of a word:

cars -> car

I can't actually think of a situation where this is necessary with adjectives, but it might happen. When you look at your output it'll be apparent if you need to do this. A POS tagger/parser/etc will get you your stemmed words (also called lemmas).

More NLP Explanations See this question.

enter image description here

like image 169
nflacco Avatar answered Oct 21 '22 06:10

nflacco


It depends on the source of your data. If the sentences come from some kind of generator, you can probably split them automatically. Otherwise you will need NLP, yes.

Properly parsing natural language pretty much is an open issue. It works "largely" for English, in particular since English sentences tend to stick to the SVO order. German for example is quite nasty here, as different word orders convey different emphasis (and thus can convey different meanings, in particular when irony is used). Additionally, German tends to use subordinate clauses much more.

NLP clearly is the way to go. At least some basic parser will be needed. It really depends on your task, too: do you need to make sure every one is correct, or is a probabilistic approach good enough? Can "difficult" cases be discarded or fed to a human for review? etc.

like image 35
Has QUIT--Anony-Mousse Avatar answered Oct 21 '22 08:10

Has QUIT--Anony-Mousse