Extracting the person names in the named entity recognition in NLP using Python

Tags:

I have a sentence for which i need to identify the Person names alone:

For example:

sentence = "Larry Page is an American business magnate and computer scientist who is the co-founder of Google, alongside Sergey Brin"

I have used the below code to identify the NERs.

from nltk import word_tokenize, pos_tag, ne_chunk
print(ne_chunk(pos_tag(word_tokenize(sentence))))

The output i received was:

(S
  (PERSON Larry/NNP)
  (ORGANIZATION Page/NNP)
  is/VBZ
  an/DT
  (GPE American/JJ)
  business/NN
  magnate/NN
  and/CC
  computer/NN
  scientist/NN
  who/WP
  is/VBZ
  the/DT
  co-founder/NN
  of/IN
  (GPE Google/NNP)
  ,/,
  alongside/RB
  (PERSON Sergey/NNP Brin/NNP))

I want to extract all the person names, such as

Larry Page
Sergey Brin

In order to achieve this, I refereed this link and tried this.

from nltk.tag.stanford import StanfordNERTagger
st = StanfordNERTagger('/usr/share/stanford-ner/classifiers/english.all.3class.distsim.crf.ser.gz','/usr/share/stanford-ner/stanford-ner.jar')

However i continue to get this error:

LookupError: Could not find stanford-ner.jar jar file at /usr/share/stanford-ner/stanford-ner.jar

Where can i download this file?

As informed above, the result that i am expecting in the form of list or dictionary is :

Larry Page
Sergey Brin

752

asked Mar 20 '18 15:03

Doubt Dhanabalu

1 Answers

In Long

Please read these carefully:

https://stackoverflow.com/a/49345866/610569
Extract list of Persons and Organizations using Stanford NER Tagger in NLTK

Understand the solution, don't just copy and paste.

TL;DR

In terminal:

pip install -U nltk

wget http://nlp.stanford.edu/software/stanford-corenlp-full-2016-10-31.zip
unzip stanford-corenlp-full-2016-10-31.zip && cd stanford-corenlp-full-2016-10-31

java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer \
-preload tokenize,ssplit,pos,lemma,parse,depparse \
-status_port 9000 -port 9000 -timeout 15000

In Python

from nltk.tag.stanford import CoreNLPNERTagger

def get_continuous_chunks(tagged_sent):
    continuous_chunk = []
    current_chunk = []

    for token, tag in tagged_sent:
        if tag != "O":
            current_chunk.append((token, tag))
        else:
            if current_chunk: # if the current chunk is not empty
                continuous_chunk.append(current_chunk)
                current_chunk = []
    # Flush the final current_chunk into the continuous_chunk, if any.
    if current_chunk:
        continuous_chunk.append(current_chunk)
    return continuous_chunk


stner = CoreNLPNERTagger()
tagged_sent = stner.tag('Rami Eid is studying at Stony Brook University in NY'.split())

named_entities = get_continuous_chunks(tagged_sent)
named_entities_str_tag = [(" ".join([token for token, tag in ne]), ne[0][1]) for ne in named_entities]


print(named_entities_str_tag)

[out]:

[('Rami Eid', 'PERSON'), ('Stony Brook University', 'ORGANIZATION'), ('NY', 'LOCATION')]

You might find this help too: Unpacking a list / tuple of pairs into two lists / tuples

answered Oct 10 '22 17:10

alvas

Related questions
                            
                                Import _tkinter or tkinter?
                            
                                How to see Python executable output in a cmd window?
                            
                                Numpy ndarray shape with 3 parameters
                            
                                ThreadPoolExecutor with context manager
                            
                                How to preserve the datatype while iterating dataframe in pandas?
                            
                                Dask dataframes: reading multiple files & storing filename in column
                            
                                Collapse Dataframe Pivot to Single Row
                            
                                Python conditional joining of *consecutive* strings that don't end in punctuation with those that do
                            
                                Find maximum value of time in list containing tuples of time in format ('hour', 'min', 'AM/PM')
                            
                                How to add a table in django app models from PostgreSQL?
                            
                                Passing argument in groupby.agg with multiple functions
                            
                                Pandas groupby and sum total of group
                            
                                Pandas groupby conditional subtraction
                            
                                Pandas dataframe to excel gives "file is not UTF-8 encoded"
                            
                                Can the sigmoid activation function be used to solve regression problems in Keras?
                            
                                Understanding Partial Dependence for Gradient Boosted Regression trees
                            
                                How to get value of a column based on the maximum of another column in case of DataFrame.groupby
                            
                                "detail": "Method \"GET\" not allowed. on calling endpoint in django
                            
                                Count zero rows in 2D numpy array
                            
                                Merge items on dataframes with duplicate values

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Extracting the person names in the named entity recognition in NLP using Python

Tags:

python

nlp

nltk

stanford-nlp