I'm trying to detect laughing words like "hahahaha" and "lolololol" in a string.
Currently I'm using the following regex:
^((.*?)|)(\b[ha]|\b[lo])(.*?)$
However, this doesn't work for my purposes. It works, but it also matches words totally unrelated to laughter, such as 'kill', because it simply looks for any word that contains the letters l, o, h, a.
How can I detect laughing words (like "hahaha" or "lololol") in a string?
In Python, I tried to do it in this way:
import re
re.sub(r"\b(?:a{0,2}h{1,2}a{0,2}){2,}h?\b", "<laugh>", "hahahahha! I love laughing")
>> <laugh>! I love laughing
try with this pattern:
\b(?:a*(?:ha)+h?|(?:l+o+)+l+)\b
or better if your regex flavour support atomic groups and possessive quantifiers:
\b(?>a*+(?:ha)++h?|(?:l+o+)++l+)\b
\b(a*ha+h[ha]*|o?l+o+l+[ol]*)\b
Matches:
hahahah
haha
lol
loll
loool
looooool
lolololol
lolololololo
ahaha
aaaahahahahahaha
Does not match:
looo
oool
oooo
llll
ha
l
o
lo
ol
ah
aah
aha
kill
lala
haunt
hauha
louol
To keep it simple, because the solutions posted may be overly complicated for what you want to do: if the only thing you count as "laughing words" are ha
, haha
, etc. and lol
, lolol
, lololol
, etc., then the following regular expression will be sufficient:
\b(ha)+|l(ol)+\b
This assumes a regex dialect in which \b
represents a word boundary, which you seem to be using.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With