Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

RegExp detect multiple single-letter instances in a row?

So I'm making a program to parse twitch chat, and I'm wondering if there's a way I can use regex to parse the following into the desired result:

"f o o b a r" into "foobar"

So far, the code I have is /(?:(\w)\s){3,}/g and this works to an extent, but consider the following situation:

"FrankerZ R I O T FrankerZ" captures "T" (the last letter in "R I O T") and selects "Z R I O T"

What I would want for this is to figure out how to detect if there is a single letter with a space before and after it, and if there are at minimum 3 of those in a row (so "test a b test" isn't selected as ab, only captures if there are 3+)

Any help? Thanks!

like image 341
Flipybitz Avatar asked Jul 29 '15 00:07

Flipybitz


People also ask

What does ?= * Mean in regex?

?= is a positive lookahead, a type of zero-width assertion. What it's saying is that the captured match must be followed by whatever is within the parentheses but that part isn't captured. Your example means the match needs to be followed by zero or more characters and then a digit (but again that part isn't captured).

What is multiline regex?

Multiline option, or the m inline option, enables the regular expression engine to handle an input string that consists of multiple lines. It changes the interpretation of the ^ and $ language elements so that they match the beginning and end of a line, instead of the beginning and end of the input string.

How do you match letters in regex?

Using character sets For example, the regular expression "[ A-Za-z] " specifies to match any single uppercase or lowercase letter. In the character set, a hyphen indicates a range of characters, for example [A-Z] will match any one capital letter.

Which regex option matches any character including a new line?

If you want . to match really everything, including newlines, you need to enable “dot-matches-all” mode in your regex engine of choice (for example, add re. DOTALL flag in Python, or /s in PCRE.


3 Answers

Try this pattern: /(?:\b\w(?:\s|$)){3,}/g

This uses the word boundary metacharacter \b so you get a proper whole word match instead of the partial match you saw with FrankerZ. Also, the \s|$ bit addresses the last letter being lost when no space comes after it, e.g., the "T" in R I O T.

Example:

var inputs = [
  "R I",
  "R I O T",
  "FrankerZ R I O T FrankerZ",
  "f o o b a r"
];

var re = /(?:\b\w(?:\s|$)){3,}/g;

inputs.forEach(function(s) {
  var match = s.match(re);
  if (match) {
    var result = match[0].replace(/\s/g, '');
    console.log('Original: ' + s);
    console.log('Result: ' + result);
  } else {
    console.log('No match: ' + s);
  }
});

Demo: JSBin

EDIT: updated to cover 3+ single letters and example of no match.

like image 199
Ahmad Mageed Avatar answered Oct 22 '22 17:10

Ahmad Mageed


Here is a good reference how to replace with matches Javascript replace with reference to matched group?

So you could do:

'string'.replace(/(\s|^)((?:\w\s){2,}\w)(\s|$)/g, function(a, b, c, d) {
     return b + c.replace(/\s/g, '') + d;
});

See demo

like image 22
maraca Avatar answered Oct 22 '22 17:10

maraca


Thank you to Sam Burns for suggesting the use of \b. What works for me was:

/\b((?:\w ?\b){3,})/g

This would select the following:

H Y P E from FrankerZ H Y P E FrankerZ, and f o o b a r (doesn't end or begin with a space character, was giving me issues as well)

Specifying the literal space " " character instead of \s was also important for avoiding line breaks and other instances when I only wanted to check for just the space character in the first place.

For replacing it without spaces, I'll simply do .replace(" ","") to receive the exact result I wanted. Thanks again for everyone's help :)

like image 1
Flipybitz Avatar answered Oct 22 '22 17:10

Flipybitz