There is a string like it
"dxabcabcyyyydxycxcxz"
and I want to merge it into
"dxabcydxycxz"
Other examples: "ddxddx" -> "dxdx" , "abbab" -> "abab".
The rule is that :
if (adjacent and same): merge
# Such as 'abc', they are same, so delete one of them
# Although 'dx' is same as 'dx', they are nonadjacent, so do not delete any of them
# If one character has been deleted, don't delete any substring, include it
I've done it in Python, but it's slow when applied to a long string.
# Original string
mystr = "dxabcabcyyyydxycxcxz"
str_len = len(mystr)
vis = [1] * str_len # Use a list to mark which char is deleted
# Enumerate the size of substring
for i in range(1,str_len):
# Enumerate the begin of the substring
for j in range(0, str_len):
offset = 2 #the size of sub-str + 1
current_sub_str = mystr[j:j+i]
s_begin = j+i*(offset-1)
s_end = j+(i*offset)
# Delete all of the same char
while((j+(i*offset) <= str_len) and current_sub_str == mystr[s_begin:s_end]
and 0 not in vis[s_begin:s_end] and 0 not in vis[j:j+i]):
vis[s_begin:s_end] = [0] * (s_end - s_begin) # If it was deleted, mark it as 0
offset += 1
s_begin = j + i * (offset - 1)
s_end = j + (i * offset)
res = []
for i in range(0,str_len):
if(vis[i]!=0): res.append(mystr[i])
print "".join(res)
Is there any faster way to solve it?
Update April 29, 2017
Sorry, it seems to be like an XY problem. On the other hand, it maybe not. There is the content I was coding for a web spider and got many 'tag-path's like those:
ul/li/a
ul/li/div/div/div/a/span
ul/li/div/div/div/a/span
ul/li/a
ul/li/ul/li/a
ul/li/ul/li/a
ul/li/ul/li/a
ul/li/ul/li/a
ul/li/ul/li/a
ul/li/ul/li/a
ul/li/ul/li/a
ul/li/ul/li/a
ul/li/a
ul/li/ul/li/a
ul/li/ul/li/a
ul/li/ul/li/a
ul/li/ul/li/a
ul/li/ul/li/a
ul/li/a
ul/li/ul/li/a
ul/li/ul/li/a
ul/li/ul/li/a
ul/li/ul/li/a
ul/li/ul/li/a
ul/li/ul/li/a
As you see, some of the 'tag-path's are identical, so I wanted to collapse them to find out if there is any other 'tag-path's with the same structure.
After collapsing, I get the 'tag-path' like this.
ul/li/a
ul/li/div/div/div/a/span
ul/li/a
ul/li/ul/li/a
ul/li/a
ul/li/ul/li/a
ul/li/a
ul/li/ul/li/a
This is only my idea and I didn't know whether it is suitable to do this way. (After trying, I chose another way to do it).
However there is an interesting question like an ACM question.
So, I simplify one 'tag-path' to a character and ask for help. Because I didn't do a fast way by myself. Actually, the question has many corner cases that I don't mind and thank all for helping me to complete it.
Thanks all.
Behold the power of regex:
>>> import re
>>> re.sub(r"(.+?)\1+", r"\1", "dxabcabcyyyydxycxcxz")
'dxabcydxycxz'
>>> re.sub(r"(.+?)\1+", r"\1", "ddxddx")
'dxdx'
>>> re.sub(r"(.+?)\1+", r"\1", "abbab")
'abab'
This looks for a sequence of 1 or more arbitrary characters (.+?)
(as a non-greedy match, so that it tries shorter sequences first), followed by 1 or more repetitions of the matched sequence \1+
, and replaces it all with just the matched sequence \1
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With