Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python programming finding similar names from a list of names

I am using a dataset of company names with that may contains not identical duplicates.

The list may contains : company A but also c.o.m.p.a.n.y A or comp A

Is there any python script using NLP for example that can find similar names from a dataset.

Thanks in advance

like image 943
Amine Avatar asked Oct 25 '25 02:10

Amine


1 Answers

You can use spacy to get similarities between 2 texts.

import spacy

nlp = spacy.load("en_core_web_md")  # make sure to use larger package!
doc1 = nlp("Coca-Cola")
doc2 = nlp("Pepsi")

doc3 = nlp("Company Coca-Cola")
doc4 = nlp("Company Pepsi-Cola")


print(doc1, "<->", doc2, doc1.similarity(doc2))
print(doc3, "<->", doc4, doc3.similarity(doc4))

With following similarities

Coca-Cola <-> Pepsi 0.6684898494102074
Company Coca-Cola <-> Company Pepsi-Cola 0.934960639746236
like image 60
PleSo Avatar answered Oct 26 '25 17:10

PleSo