I would like to have a regex pattern to match smileys ":)" ,":(" .Also it should capture repeated smileys like ":) :)" , ":) :(" but filter out invalid syntax like ":( (" .
I have this with me, but it matches ":( ("
bool( re.match("(:\()",str) )
I maybe missing something obvious here, and I'd like some help for this seemingly simple task.
Using emoji module: Emojis can also be implemented by using the emoji module provided in Python. To install it run the following in the terminal. emojize() function requires the CLDR short name to be passed in it as the parameter. It then returns the corresponding emoji.
To remove the emojis, we set the parameter no_emoji to True .
Python has a module named re to work with regular expressions. To use it, we need to import the module. The module defines several functions and constants to work with RegEx.
I think it finally "clicked" exactly what you're asking about here. Take a look at the below:
import re
smiley_pattern = '^(:\(|:\))+$' # matches only the smileys ":)" and ":("
def test_match(s):
print 'Value: %s; Result: %s' % (
s,
'Matches!' if re.match(smiley_pattern, s) else 'Doesn\'t match.'
)
should_match = [
':)', # Single smile
':(', # Single frown
':):)', # Two smiles
':(:(', # Two frowns
':):(', # Mix of a smile and a frown
]
should_not_match = [
'', # Empty string
':(foo', # Extraneous characters appended
'foo:(', # Extraneous characters prepended
':( :(', # Space between frowns
':( (', # Extraneous characters and space appended
':((' # Extraneous duplicate of final character appended
]
print('The following should all match:')
for x in should_match: test_match(x);
print('') # Newline for output clarity
print('The following should all not match:')
for x in should_not_match: test_match(x);
The problem with your original code is that your regex is wrong: (:\()
. Let's break it down.
The outside parentheses are a "grouping". They're what you'd reference if you were going to do a string replacement, and are used to apply regex operators on groups of characters at once. So, you're really saying:
(
begin a group
:\(
... do regex stuff ...The :
isn't a regex reserved character, so it's just a colon. The \
is, and it means "the following character is literal, not a regex operator". This is called an "escape sequence". Fully parsed into English, your regex says
(
begin a group
:
a colon character\(
a left parenthesis character)
end the groupThe regex I used is slightly more complex, but not bad. Let's break it down: ^(:\(|:\))+$
.
^
and $
mean "the beginning of the line" and "the end of the line" respectively. Now we have ...
^
beginning of line
(:\(|:\))+
... do regex stuff ...$
end of line... so it only matches things that comprise the entire line, not simply occur in the middle of the string.
We know that (
and )
denote a grouping. +
means "one of more of these". Now we have:
^
beginning of line(
start a group
:\(|:\)
... do regex stuff ...)
end the group+
match one or more of this$
end of lineFinally, there's the |
(pipe) operator. It means "or". So, applying what we know from above about escaping characters, we're ready to complete the translation:
^
beginning of line(
start a group
:
a colon character\(
a left parenthesis character|
or
:
a colon character\)
a right parenthesis character)
end the group+
match one or more of this$
end of lineI hope this helps. If not, let me know and I'll be happy to edit my answer with a reply.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With