So I'm making a program to parse twitch chat, and I'm wondering if there's a way I can use regex to parse the following into the desired result:
"f o o b a r" into "foobar"
So far, the code I have is /(?:(\w)\s){3,}/g
and this works to an extent, but consider the following situation:
"FrankerZ R I O T FrankerZ"
captures "T" (the last letter in "R I O T"
) and selects "Z R I O T"
What I would want for this is to figure out how to detect if there is a single letter with a space before and after it, and if there are at minimum 3 of those in a row (so "test a b test"
isn't selected as ab
, only captures if there are 3+)
Any help? Thanks!
?= is a positive lookahead, a type of zero-width assertion. What it's saying is that the captured match must be followed by whatever is within the parentheses but that part isn't captured. Your example means the match needs to be followed by zero or more characters and then a digit (but again that part isn't captured).
Multiline option, or the m inline option, enables the regular expression engine to handle an input string that consists of multiple lines. It changes the interpretation of the ^ and $ language elements so that they match the beginning and end of a line, instead of the beginning and end of the input string.
Using character sets For example, the regular expression "[ A-Za-z] " specifies to match any single uppercase or lowercase letter. In the character set, a hyphen indicates a range of characters, for example [A-Z] will match any one capital letter.
If you want . to match really everything, including newlines, you need to enable “dot-matches-all” mode in your regex engine of choice (for example, add re. DOTALL flag in Python, or /s in PCRE.
Try this pattern: /(?:\b\w(?:\s|$)){3,}/g
This uses the word boundary metacharacter \b
so you get a proper whole word match instead of the partial match you saw with FrankerZ
. Also, the \s|$
bit addresses the last letter being lost when no space comes after it, e.g., the "T" in R I O T
.
Example:
var inputs = [
"R I",
"R I O T",
"FrankerZ R I O T FrankerZ",
"f o o b a r"
];
var re = /(?:\b\w(?:\s|$)){3,}/g;
inputs.forEach(function(s) {
var match = s.match(re);
if (match) {
var result = match[0].replace(/\s/g, '');
console.log('Original: ' + s);
console.log('Result: ' + result);
} else {
console.log('No match: ' + s);
}
});
Demo: JSBin
EDIT: updated to cover 3+ single letters and example of no match.
Here is a good reference how to replace with matches Javascript replace with reference to matched group?
So you could do:
'string'.replace(/(\s|^)((?:\w\s){2,}\w)(\s|$)/g, function(a, b, c, d) {
return b + c.replace(/\s/g, '') + d;
});
See demo
Thank you to Sam Burns for suggesting the use of \b. What works for me was:
/\b((?:\w ?\b){3,})/g
This would select the following:
H Y P E
from FrankerZ H Y P E FrankerZ
,
and
f o o b a r
(doesn't end or begin with a space character, was giving me issues as well)
Specifying the literal space " "
character instead of \s
was also important for avoiding line breaks and other instances when I only wanted to check for just the space character in the first place.
For replacing it without spaces, I'll simply do .replace(" ","")
to receive the exact result I wanted. Thanks again for everyone's help :)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With