Calculating the similarity of two lists

Tags:

algorithm

I have two lists:

eg. a = [1,8,3,9,4,9,3,8,1,2,3] and b = [1,8,1,3,9,4,9,3,8,1,2,3]

Both contain ints. There is no meaning behind the ints (eg. 1 is not 'closer' to 3 than it is to 8).

I'm trying to devise an algorithm to calculate the similarity between two ORDERED lists. Ordered is keyword right here (so I can't just take the set of both lists and calculate their set_difference percentage). Sometimes numbers do repeat (for example 3, 8, and 9 above, and I cannot ignore the repeats).

In the example above, the function I would call would tell me that a and b are ~90% similar for example. How can I do that? Edit distance was something which came to mind. I know how to use it with strings but I'm not sure how to use it with a list of ints. Thanks!

297

asked Jul 15 '11 15:07

aerain

1 Answers

You can use the difflib module

ratio()
Return a measure of the sequences’ similarity as a float in the range [0, 1].

Which gives :

 >>> s1=[1,8,3,9,4,9,3,8,1,2,3]  >>> s2=[1,8,1,3,9,4,9,3,8,1,2,3]  >>> sm=difflib.SequenceMatcher(None,s1,s2)  >>> sm.ratio()  0.9565217391304348

186

answered Oct 17 '22 08:10

kraymer

Related questions
                            
                                How do I make environment variable changes stick in Python?
                            
                                How to generate data model from sql schema in Django?
                            
                                Python MySQL wrong architecture error
                            
                                How to get center of set of points using Python
                            
                                Good Python library for AMQP [closed]
                            
                                Printing { and } with new format syntax
                            
                                Using Python, find anagrams for a list of words
                            
                                python 3D visualization and graphics [closed]
                            
                                Problems filtering django datetime field by month and day
                            
                                How to get the difference between two dates in hours, minutes and seconds.? [duplicate]
                            
                                ImportError: cannot import name get_column_letter
                            
                                Matplotlib: set axis tight only to x or y axis
                            
                                pandas style background gradient both rows and columns
                            
                                Installing package not found in conda
                            
                                WebDriverException: unknown error: cannot find Chrome binary error with Selenium in Python for older versions of Google Chrome
                            
                                Meaning of parameters in torch.nn.conv2d
                            
                                tail -f in python with no time.sleep
                            
                                python argparse: How can I display help automatically on error?
                            
                                Executing assembler code with python
                            
                                Windows + virtualenv + pip + NumPy (problems when installing NumPy)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With