I have a sentence for which i need to identify the Person names alone:
For example:
sentence = "Larry Page is an American business magnate and computer scientist who is the co-founder of Google, alongside Sergey Brin"
I have used the below code to identify the NERs.
from nltk import word_tokenize, pos_tag, ne_chunk
print(ne_chunk(pos_tag(word_tokenize(sentence))))
The output i received was:
(S
(PERSON Larry/NNP)
(ORGANIZATION Page/NNP)
is/VBZ
an/DT
(GPE American/JJ)
business/NN
magnate/NN
and/CC
computer/NN
scientist/NN
who/WP
is/VBZ
the/DT
co-founder/NN
of/IN
(GPE Google/NNP)
,/,
alongside/RB
(PERSON Sergey/NNP Brin/NNP))
I want to extract all the person names, such as
Larry Page
Sergey Brin
In order to achieve this, I refereed this link and tried this.
from nltk.tag.stanford import StanfordNERTagger
st = StanfordNERTagger('/usr/share/stanford-ner/classifiers/english.all.3class.distsim.crf.ser.gz','/usr/share/stanford-ner/stanford-ner.jar')
However i continue to get this error:
LookupError: Could not find stanford-ner.jar jar file at /usr/share/stanford-ner/stanford-ner.jar
Where can i download this file?
As informed above, the result that i am expecting in the form of list or dictionary is :
Larry Page
Sergey Brin
The named entity recognition (NER) is one of the most data preprocessing task. It involves the identification of key information in the text and classification into a set of predefined categories. An entity is basically the thing that is consistently talked about or refer to in the text. NER is the form of NLP.
So first, we need to create entity categories, like Name, Location, Event, Organization, etc., and feed a NER model relevant training data. Then, by tagging some samples of words and phrases with their corresponding entities, we'll eventually teach our NER model to detect the entities and categorize them.
nlp = pipeline ( "ner" , model = model , tokenizer = tokenizer ) example = """The Kashmir Files is a 2022 Indian… Clearly to work with or make NER Tagger we need large annotated data, datasets like conll-03 are limited with only few entities like person, organization, location, miscellaneous.
Please read these carefully:
Understand the solution, don't just copy and paste.
In terminal:
pip install -U nltk
wget http://nlp.stanford.edu/software/stanford-corenlp-full-2016-10-31.zip
unzip stanford-corenlp-full-2016-10-31.zip && cd stanford-corenlp-full-2016-10-31
java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer \
-preload tokenize,ssplit,pos,lemma,parse,depparse \
-status_port 9000 -port 9000 -timeout 15000
In Python
from nltk.tag.stanford import CoreNLPNERTagger
def get_continuous_chunks(tagged_sent):
continuous_chunk = []
current_chunk = []
for token, tag in tagged_sent:
if tag != "O":
current_chunk.append((token, tag))
else:
if current_chunk: # if the current chunk is not empty
continuous_chunk.append(current_chunk)
current_chunk = []
# Flush the final current_chunk into the continuous_chunk, if any.
if current_chunk:
continuous_chunk.append(current_chunk)
return continuous_chunk
stner = CoreNLPNERTagger()
tagged_sent = stner.tag('Rami Eid is studying at Stony Brook University in NY'.split())
named_entities = get_continuous_chunks(tagged_sent)
named_entities_str_tag = [(" ".join([token for token, tag in ne]), ne[0][1]) for ne in named_entities]
print(named_entities_str_tag)
[out]:
[('Rami Eid', 'PERSON'), ('Stony Brook University', 'ORGANIZATION'), ('NY', 'LOCATION')]
You might find this help too: Unpacking a list / tuple of pairs into two lists / tuples
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With