Python: Semantic similarity score for Strings [duplicate]

Tags:

Are there any libraries for computing semantic similarity scores for a pair of sentences ?

I'm aware of WordNet's semantic database, and how I can generate the score for 2 words, but I'm looking for libraries that do all pre-processing tasks like port-stemming, stop word removal, etc, on whole sentences and outputs a score for how related the two sentences are.

I found a work in progress that's written using the .NET framework that computes the score using an array of pre-processing steps. Is there any project that does this in python?

I'm not looking for the sequence of operations that would help me find the score (as is asked for here)
I'd love to implement each stage on my own, or glue functions from different libraries so that it works for sentence pairs, but I need this mostly as a tool to test inferences on data.

EDIT: I was considering using NLTK and computing the score for every pair of words iterated over the two sentences, and then draw inferences from the standard deviation of the results, but I don't know if that's a legitimate estimate of similarity. Plus, that'll take a LOT of time for long strings.
Again, I'm looking for projects/libraries that already implement this intelligently. Something that lets me do this:

import amazing_semsim_package str1='Birthday party ruined as cake explodes' str2='Grandma mistakenly bakes cake using gunpowder'  >>similarity(str1,str2) >>0.889

392

asked Jun 10 '13 11:06

user8472

1 Answers

The best package I've seen for this is Gensim, found at the Gensim Homepage. I've used it many times, and overall been very happy with it's ease of use; it is written in Python, and has an easy to follow tutorial to get you started, which compares 9 strings. It can be installed via pip, so you won't have a lot of hassle getting it installed I hope.

Which scoring algorithm you use depends heavily on the context of your problem, but I'd suggest starting of with the LSI functionality if you want something basic. (That's what the tutorial walks you through.)

If you go through the tutorial for gensim, it will walk you through comparing two strings, using the Similarities function. This will allow you to see how your stings compare to each other, or to some other sting, on the basis of the text they contain.

If you're interested in the science behind how it works, check out this paper.

answered Oct 12 '22 09:10

Justin Muller

Related questions
                            
                                numpy: Reliable (non-conservative) indicator if numpy array is view
                            
                                How can I create a ramdisk in Python?
                            
                                File size differences after copying a file to a server vía FTP
                            
                                Is there any nosql flat file database just as sqlite? [closed]
                            
                                Do overridden methods inherit decorators in python?
                            
                                Genetic Algorithms and multi-objectives optimization on PYTHON : libraries/tools to use? [closed]
                            
                                Django Selective Dumpdata
                            
                                Please explain "Task was destroyed but it is pending!"
                            
                                How can I find the full path to a font from its display name on a Mac?
                            
                                A comparison between fastparquet and pyarrow?
                            
                                Best way to encode tuples with json
                            
                                Matplotlib savefig with a legend outside the plot
                            
                                Celery Worker Database Connection Pooling
                            
                                If x is list, why does x += "ha" work, while x = x + "ha" throws an exception?
                            
                                What does "Symbol not found / Expected in: flat namespace" actually mean?
                            
                                Python 3 dictionary with known keys typing
                            
                                Cross-platform desktop notifier in Python
                            
                                Convert JSON to SQLite in Python - How to map json keys to database columns properly?
                            
                                How to export figures to files from IPython Notebook
                            
                                Visual Studio Code: run Python file with arguments

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python: Semantic similarity score for Strings [duplicate]

Tags:

python

semantics

similarity

wordnet

user8472

People also ask

1 Answers

Justin Muller

Recent Activity

Donate For Us