My goal is to detect that 2 string are same but in different order.
Example
"hello world my name is foobar" is the same as "my name is foobar world hello"
What i already tried is splitting both string into list and compare it within loop.
text = "hello world my name is foobar"
textSplit = text.split()
pattern = "foobar is my name world hello"
pattern = pattern.split()
count = 0
for substring in pattern:
if substring in textSplit:
count += 1
if (count == len(pattern)):
print ("same string detected")
It return what i intended, but is this actually correct and efficient way? Maybe there is another approach. Any suggestion of journal about that topic would be really nice.
Edit 1: Duplicate words are important
text = "fish the fish the fish fish fish"
pattern = "the fish"
It must return false
If you want to check that 2 sentences have the same words (with the same number of occurences), you could split the sentences in words and sort them:
>>> sorted("hello world my name is foobar".split())
['foobar', 'hello', 'is', 'my', 'name', 'world']
>>> sorted("my name is foobar world hello".split())
['foobar', 'hello', 'is', 'my', 'name', 'world']
You could define the check in a function:
def have_same_words(sentence1, sentence2):
return sorted(sentence1.split()) == sorted(sentence2.split())
print(have_same_words("hello world my name is foobar", "my name is foobar world hello"))
# True
print(have_same_words("hello world my name is foobar", "my name is foobar world hello"))
# True
print(have_same_words("hello", "hello hello"))
# False
print(have_same_words("hello", "holle"))
# False
If case isn't important, you could compare lowercase sentences:
def have_same_words(sentence1, sentence2):
return sorted(sentence1.lower().split()) == sorted(sentence2.lower().split())
print(have_same_words("Hello world", "World hello"))
# True
Note: you could also use collections.Counter
instead of sorted
. The complexity would be O(n)
instead of O(n.log(n))
, which isn't a big difference anyway. import collections
might take a longer time than sorting the strings:
from collections import Counter
def have_same_words(sentence1, sentence2):
return Counter(sentence1.lower().split()) == Counter(sentence2.lower().split())
print(have_same_words("Hello world", "World hello"))
# True
print(have_same_words("hello world my name is foobar", "my name is foobar world hello"))
# True
print(have_same_words("hello", "hello hello"))
# False
print(have_same_words("hello", "holle"))
# False
I think with your implementation then extra words in the text get ignored (maybe this was intended?).
Ie if text = "a b"
and pattern = "a"
then yours prints "same string detected"
The way I'd do it: Comparison where order doesn't matter makes me think of sets. So a solution with sets would be:
same = set(text.split()) == set(pattern.split())
Edit: Taking into account the repeated words edit to the question:
from collections import Counter
split_text = text.split()
split_pattern = pattern.split()
same = (Counter(split_text) == Counter(split_pattern))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With