I'd like to store a lot of words in a list. Many of these words are very similar. For example I have word afrykanerskojęzyczny
and many of words like afrykanerskojęzycznym
, afrykanerskojęzyczni
, nieafrykanerskojęzyczni
. What is the effective (fast and giving small diff size) solution to find difference between two strings and restore second string from the first one and diff?
In python programming we can check whether strings are equal or not using the “==” or by using the “. __eq__” function. Example: s1 = 'String' s2 = 'String' s3 = 'string' # case sensitive equals check if s1 == s2: print('s1 and s2 are equal.
To find the difference between 2 Strings you can use the StringUtils class and the difference method. It compares the two Strings, and returns the portion where they differ.
The most straightforward method to determine if two strings are equal in Python is to use the == operator. This operator compares if two strings have the same value and returns True if they do. Else, it returns False .
You can use ndiff in the difflib module to do this. It has all the information necessary to convert one string into another string.
A simple example:
import difflib cases=[('afrykanerskojęzyczny', 'afrykanerskojęzycznym'), ('afrykanerskojęzyczni', 'nieafrykanerskojęzyczni'), ('afrykanerskojęzycznym', 'afrykanerskojęzyczny'), ('nieafrykanerskojęzyczni', 'afrykanerskojęzyczni'), ('nieafrynerskojęzyczni', 'afrykanerskojzyczni'), ('abcdefg','xac')] for a,b in cases: print('{} => {}'.format(a,b)) for i,s in enumerate(difflib.ndiff(a, b)): if s[0]==' ': continue elif s[0]=='-': print(u'Delete "{}" from position {}'.format(s[-1],i)) elif s[0]=='+': print(u'Add "{}" to position {}'.format(s[-1],i)) print()
prints:
afrykanerskojęzyczny => afrykanerskojęzycznym Add "m" to position 20 afrykanerskojęzyczni => nieafrykanerskojęzyczni Add "n" to position 0 Add "i" to position 1 Add "e" to position 2 afrykanerskojęzycznym => afrykanerskojęzyczny Delete "m" from position 20 nieafrykanerskojęzyczni => afrykanerskojęzyczni Delete "n" from position 0 Delete "i" from position 1 Delete "e" from position 2 nieafrynerskojęzyczni => afrykanerskojzyczni Delete "n" from position 0 Delete "i" from position 1 Delete "e" from position 2 Add "k" to position 7 Add "a" to position 8 Delete "ę" from position 16 abcdefg => xac Add "x" to position 0 Delete "b" from position 2 Delete "d" from position 4 Delete "e" from position 5 Delete "f" from position 6 Delete "g" from position 7
I like the ndiff answer, but if you want to spit it all into a list of only the changes, you could do something like:
import difflib case_a = 'afrykbnerskojęzyczny' case_b = 'afrykanerskojęzycznym' output_list = [li for li in difflib.ndiff(case_a, case_b) if li[0] != ' ']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With