I'd like to store a lot of words in a list. Many of these words are very similar. For example I have word <code>afrykanerskojęzyczny</code> and many of words like <code>afrykanerskojęzycznym</code>, <code>afrykanerskojęzyczni</code>, <code>nieafrykanerskojęzyczni</code>. What is the effective (fast and giving small diff size) solution to find difference between two strings and restore second string from the first one and diff?

You can use ndiff in the difflib module to do this. It has all the information necessary to convert one string into another string. A simple example: <pre class="prettyprint"><code>import difflib cases=[('afrykanerskojęzyczny', 'afrykanerskojęzycznym'), ('afrykanerskojęzyczni', 'nieafrykanerskojęzyczni'), ('afrykanerskojęzycznym', 'afrykanerskojęzyczny'), ('nieafrykanerskojęzyczni', 'afrykanerskojęzyczni'), ('nieafrynerskojęzyczni', 'afrykanerskojzyczni'), ('abcdefg','xac')] for a,b in cases: print('{} => {}'.format(a,b)) for i,s in enumerate(difflib.ndiff(a, b)): if s[0]==' ': continue elif s[0]=='-': print(u'Delete "{}" from position {}'.format(s[-1],i)) elif s[0]=='+': print(u'Add "{}" to position {}'.format(s[-1],i)) print() </code></pre> prints: <pre class="prettyprint"><code>afrykanerskojęzyczny => afrykanerskojęzycznym Add "m" to position 20 afrykanerskojęzyczni => nieafrykanerskojęzyczni Add "n" to position 0 Add "i" to position 1 Add "e" to position 2 afrykanerskojęzycznym => afrykanerskojęzyczny Delete "m" from position 20 nieafrykanerskojęzyczni => afrykanerskojęzyczni Delete "n" from position 0 Delete "i" from position 1 Delete "e" from position 2 nieafrynerskojęzyczni => afrykanerskojzyczni Delete "n" from position 0 Delete "i" from position 1 Delete "e" from position 2 Add "k" to position 7 Add "a" to position 8 Delete "ę" from position 16 abcdefg => xac Add "x" to position 0 Delete "b" from position 2 Delete "d" from position 4 Delete "e" from position 5 Delete "f" from position 6 Delete "g" from position 7 </code></pre>

I like the ndiff answer, but if you want to spit it all into a list of only the changes, you could do something like: <pre class="prettyprint"><code>import difflib case_a = 'afrykbnerskojęzyczny' case_b = 'afrykanerskojęzycznym' output_list = [li for li in difflib.ndiff(case_a, case_b) if li[0] != ' '] </code></pre>

Python - difference between two strings

Tags:

python

string

python-3.x

diff

I'd like to store a lot of words in a list. Many of these words are very similar. For example I have word afrykanerskojęzyczny and many of words like afrykanerskojęzycznym, afrykanerskojęzyczni, nieafrykanerskojęzyczni. What is the effective (fast and giving small diff size) solution to find difference between two strings and restore second string from the first one and diff?

824

asked Jul 28 '13 01:07

user2626682

2 Answers

You can use ndiff in the difflib module to do this. It has all the information necessary to convert one string into another string.

A simple example:

import difflib  cases=[('afrykanerskojęzyczny', 'afrykanerskojęzycznym'),        ('afrykanerskojęzyczni', 'nieafrykanerskojęzyczni'),        ('afrykanerskojęzycznym', 'afrykanerskojęzyczny'),        ('nieafrykanerskojęzyczni', 'afrykanerskojęzyczni'),        ('nieafrynerskojęzyczni', 'afrykanerskojzyczni'),        ('abcdefg','xac')]   for a,b in cases:          print('{} => {}'.format(a,b))       for i,s in enumerate(difflib.ndiff(a, b)):         if s[0]==' ': continue         elif s[0]=='-':             print(u'Delete "{}" from position {}'.format(s[-1],i))         elif s[0]=='+':             print(u'Add "{}" to position {}'.format(s[-1],i))         print()

prints:

afrykanerskojęzyczny => afrykanerskojęzycznym Add "m" to position 20  afrykanerskojęzyczni => nieafrykanerskojęzyczni Add "n" to position 0 Add "i" to position 1 Add "e" to position 2  afrykanerskojęzycznym => afrykanerskojęzyczny Delete "m" from position 20  nieafrykanerskojęzyczni => afrykanerskojęzyczni Delete "n" from position 0 Delete "i" from position 1 Delete "e" from position 2  nieafrynerskojęzyczni => afrykanerskojzyczni Delete "n" from position 0 Delete "i" from position 1 Delete "e" from position 2 Add "k" to position 7 Add "a" to position 8 Delete "ę" from position 16  abcdefg => xac Add "x" to position 0 Delete "b" from position 2 Delete "d" from position 4 Delete "e" from position 5 Delete "f" from position 6 Delete "g" from position 7

answered Oct 06 '22 13:10

dawg

I like the ndiff answer, but if you want to spit it all into a list of only the changes, you could do something like:

import difflib  case_a = 'afrykbnerskojęzyczny' case_b = 'afrykanerskojęzycznym'  output_list = [li for li in difflib.ndiff(case_a, case_b) if li[0] != ' ']

answered Oct 06 '22 11:10

Eric

Related questions
                            
                                JSON ValueError: Expecting property name: line 1 column 2 (char 1)
                            
                                How to split an integer into an array of digits?
                            
                                How to block calls to print?
                            
                                Django: Get an object form the DB, or 'None' if nothing matches
                            
                                Most lightweight way to create a random string and a random hexadecimal number
                            
                                How to check whether a method exists in Python?
                            
                                Python script to do something at the same time every day [duplicate]
                            
                                pip installation /usr/local/opt/python/bin/python2.7: bad interpreter: No such file or directory
                            
                                TensorFlow saving into/loading a graph from a file
                            
                                Python 3 - Encode/Decode vs Bytes/Str [duplicate]
                            
                                Get class that defined method
                            
                                Time complexity of python set operations?
                            
                                What are "soft keywords"?
                            
                                Cell-var-from-loop warning from Pylint
                            
                                What are the risks of running 'sudo pip'?
                            
                                Can sphinx link to documents that are not located in directories below the root document?
                            
                                Dead simple example of using Multiprocessing Queue, Pool and Locking
                            
                                Copy constructor in python?
                            
                                Python - Join with newline
                            
                                How to implement an efficient bidirectional hash table?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With