Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Named Entity Recognition in aspect-opinion extraction using dependency rule matching

Using Spacy, I extract aspect-opinion pairs from a text, based on the grammar rules that I defined. Rules are based on POS tags and dependency tags, which is obtained by token.pos_ and token.dep_. Below is an example of one of the grammar rules. If I pass the sentence Japan is cool, it returns [('Japan', 'cool', 0.3182)], where the value represents the polarity of cool.

However I don't know how I can make it recognise the Named Entities. For example, if I pass Air France is cool, I want to get [('Air France', 'cool', 0.3182)] but what I currently get is [('France', 'cool', 0.3182)].

I checked Spacy online documentation and I know how to extract NE(doc.ents). But I want to know what the possible workaround is to make my extractor work. Please note that I don't want a forced measure such as concatenating strings AirFrance, Air_France etc.

Thank you!

import spacy

nlp = spacy.load("en_core_web_lg-2.2.5")
review_body = "Air France is cool."
doc=nlp(review_body)

rule3_pairs = []

for token in doc:

    children = token.children
    A = "999999"
    M = "999999"
    add_neg_pfx = False

    for child in children :
        if(child.dep_ == "nsubj" and not child.is_stop): # nsubj is nominal subject
            A = child.text

        if(child.dep_ == "acomp" and not child.is_stop): # acomp is adjectival complement
            M = child.text

        # example - 'this could have been better' -> (this, not better)
        if(child.dep_ == "aux" and child.tag_ == "MD"): # MD is modal auxiliary
            neg_prefix = "not"
            add_neg_pfx = True

        if(child.dep_ == "neg"): # neg is negation
            neg_prefix = child.text
            add_neg_pfx = True

    if (add_neg_pfx and M != "999999"):
        M = neg_prefix + " " + M

    if(A != "999999" and M != "999999"):
        rule3_pairs.append((A, M, sid.polarity_scores(M)['compound']))

Result

rule3_pairs
>>> [('France', 'cool', 0.3182)]

Desired output

rule3_pairs
>>> [('Air France', 'cool', 0.3182)]
like image 200
Makoto Miyazaki Avatar asked Apr 01 '20 08:04

Makoto Miyazaki


People also ask

What are the different methods for named entity extraction?

Named entity recognition has three approaches—dictionary based, rule based, and machine learning based.

What does named entity extraction do?

Entity extraction, also known as named entity extraction (NER), enables machines to automatically identify or extract entities, like product name, event, and location. It's used by search engines to understand queries, chatbots to interact with humans, and teams to automate tedious tasks like data entry.

What is Entity aspects?

Definition (aspect and aspect expression): The aspects of an entity e are the components and attributes of e. An aspect expression is an actual word or phrase that has appeared in text indicating an aspect. Example: In the cellular phone domain, an aspect could be named voice quality.

What is aspect mining?

Aspect mining tries to identify crosscutting concerns in content. It can also provide insights that enables to classify common aspects which occur in different types of content, such as news and social data.

How to recognize and extract the entities in the text?

This is one of the simplest. We have a dictionary of values for every entity type to be recognized. To recognize and extract the entities we simply scan the text and find hits in the various dictionaries. A hit also reveals the entity type as we know the dictionary that was hit.

What is named entity recognition in NLP?

Named Entity Recognition in NLP. Real-world use cases, models, methods… | by Arun Jagota | Towards Data Science In natural language processing, named entity recognition (NER) is the problem of recognizing and extracting specific types of entities in text. Such as people or place names. In fact, any concrete “thing” that has a name.

Can named entity recognition help faceted search and browsing?

A new approach based on methods of Named Entity Recognition and rule-based Document Classification is proposed to facilitate the extraction of domain-related named entities, enabling faceted search and browsing in an innovative content and knowledge management ecosystem portal.

Does multi-token entity recognition have a sequential structure?

This merits consideration as multi-token entity recognition definitely has a sequential structure. That said, it does need some thought on what the primitive entities would be. Plus, how we will turn an HMM into a binary classifier. Consider national park names. Take Yellowstone National Park.


1 Answers

It's very easy to integrate entities in your extractor. For every pair of children, you should check whether the "A" child is the head of some named entity, and if it is true, you use the whole entity as your object.

Here I provide the whole code

!python -m spacy download en_core_web_lg
import nltk
nltk.download('vader_lexicon')

import spacy
nlp = spacy.load("en_core_web_lg")

from nltk.sentiment.vader import SentimentIntensityAnalyzer
sid = SentimentIntensityAnalyzer()


def find_sentiment(doc):
    # find roots of all entities in the text
    ner_heads = {ent.root.idx: ent for ent in doc.ents}
    rule3_pairs = []
    for token in doc:
        children = token.children
        A = "999999"
        M = "999999"
        add_neg_pfx = False
        for child in children:
            if(child.dep_ == "nsubj" and not child.is_stop): # nsubj is nominal subject
                if child.idx in ner_heads:
                    A = ner_heads[child.idx].text
                else:
                    A = child.text
            if(child.dep_ == "acomp" and not child.is_stop): # acomp is adjectival complement
                M = child.text
            # example - 'this could have been better' -> (this, not better)
            if(child.dep_ == "aux" and child.tag_ == "MD"): # MD is modal auxiliary
                neg_prefix = "not"
                add_neg_pfx = True
            if(child.dep_ == "neg"): # neg is negation
                neg_prefix = child.text
                add_neg_pfx = True
        if (add_neg_pfx and M != "999999"):
            M = neg_prefix + " " + M
        if(A != "999999" and M != "999999"):
            rule3_pairs.append((A, M, sid.polarity_scores(M)['compound']))
    return rule3_pairs

print(find_sentiment(nlp("Air France is cool.")))
print(find_sentiment(nlp("I think Gabriel García Márquez is not boring.")))
print(find_sentiment(nlp("They say Central African Republic is really great. ")))

The output of this code will be what you need:

[('Air France', 'cool', 0.3182)]
[('Gabriel García Márquez', 'not boring', 0.2411)]
[('Central African Republic', 'great', 0.6249)]

Enjoy!

like image 121
David Dale Avatar answered Oct 24 '22 03:10

David Dale