Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find a duplicate string with Pattern Matching?

I have a string similar to this:

[13:41:25] [100:Devnull]: 01:41:20, 13:41:21> |Hunit:Player-3693-07420299:DevnullYour [Chimaera Shot] hit |Hunit:Creature-0-3693-1116-3-87318-0000881AC4:Dungeoneer's Training DummyDungeoneer's Training Dummy 33265 Nature. 

In case you wonder, it's from World of Warcraft.

I'd like to end with something like this:

[13:41:25] [100:Devnull]: 01:41:20, 13:41:21> Your [Chimaera Shot] hit Dungeoneer's Training Dummy 33265 Nature. 

If you notice, "Dungeoneer's Training Dummy" is printed twice. I've managed to get rid of the first "|Hunit" portion with something like this:

str = "[13:41:25] [100:Devnull]: 01:41:20, 13:41:21> |Hunit:Player-3693-07420299:DevnullYour [Chimaera Shot] hit |Hunit:Creature-0-3693-1116-3-87318-0000881AC4:Dungeoneer's Training DummyDungeoneer's Training Dummy 33265 Nature."
str = string.gsub(str, "|Hunit:.*:.*Your", "Your")

Which returns this:

print(str)    # => [13:41:25] [100:Devnull]: 01:41:20, 13:41:21> Your [Chimaera Shot] hit |Hunit:Creature-0-3693-1116-3-87318-0000881AC4:Dungeoneer's Training DummyDungeoneer's Training Dummy 33265 Nature.

I then add a second gsub:

str = string.gsub(str, "|Hunit:.*:", "")
print(str) # => [13:41:25] [100:Devnull]: 01:41:20, 13:41:21> Your [Chimaera Shot] hit Dungeoneer's Training DummyDungeoneer's Training Dummy 33265 Nature.

But the double "Dungeoneer's Training Dummy" string is repeated, obviously.

How could I get rid of the duplicated string? This string can be anything else, in this case is "Dungeoneer's Training Dummy", but it can be the name of any other target.

like image 407
dev404 Avatar asked Mar 19 '15 20:03

dev404


People also ask

How do you check if there is a duplicate in a string?

To find the duplicate character from the string, we count the occurrence of each character in the string. If count is greater than 1, it implies that a character has a duplicate entry in the string. In above example, the characters highlighted in green are duplicate characters.

How are duplicate characters found in a string?

The duplicate characters are found in the string using a nested for loop.


1 Answers

You can try something like this:

str = "[13:41:25] [100:Devnull]: 01:41:20, 13:41:21> Your [Chimaera Shot] hit Dungeoneer's Training DummyDungeoneer's Training Dummy 33265 Nature."
-- find a string that starts with 'hit', has some number of non-digits
-- and ends with one or more digit and one or more characters.
-- these characters will be "captured" into three strings,
-- which are then passed to the "replacement" function.
-- the returned result of the function replaces the value in the string.
str = str:gsub("(hit%s+)([^%d]+)(%d+.+)", function(s1, s2, s3)
    local s = s2:gsub("%s+$","") -- drop trailing spaces
    if #s % 2 == 0 -- has an even number of characters
    and s:sub(0, #s / 2) -- first half
    == -- is the same
    s:sub(#s / 2 + 1) -- as the second half
    then -- return the second half
      return s1..s:sub(#s / 2 + 1)..' '..s3
    else
      return s1..s2..s3
    end
  end)
print(str)

This prints: [13:41:25] [100:Devnull]: 01:41:20, 13:41:21> Your [Chimaera Shot] hit Dungeoneer's Training Dummy

This code will attempt to extract the name of the target and check if the name is a full duplicate. If the match fails, it returns the original string.

like image 145
Paul Kulchenko Avatar answered Oct 19 '22 05:10

Paul Kulchenko