Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the regex to match this string?

Consider these sentences:

apple is 2kg
apple banana mango is 2kg
apple apple apple is 6kg
banana banana banana is 6kg

Given that "apple", "banana", and "mango" are the only fruits, what would be the regex to extract the fruit name(s) that appear in the start of the sentence?

I wrote this regex (https://regex101.com/r/fY8bK1/1):

^(apple|mango|banana) is (\d+)kg$  

but this only matches if a single fruit is in the sentence.

How do I extract all the fruit names?

The expected output, for all 4 sentences, should be:

apple, 2
apple banana mango, 2
apple apple apple, 6
banana banana banana, 6

like image 372
An SO User Avatar asked Sep 16 '25 00:09

An SO User


1 Answers

You can use grouping like this:

^((?:apple|mango|banana)(?:\s+(?:apple|mango|banana))*) is (\d+)kg$

See regex demo

The (?:...) is a non-capturing group inside a capturing ((...)) group so as not to create a mess in the output.

The ((?:apple|mango|banana)(?:\s+(?:apple|mango|banana))*) group matches:

  • (?:apple|mango|banana) - any value from the alternative list delimited with alternation | operator. If you plan to match whole words only, put \b at both ends of the subpattern.
  • (?:\s+(?:apple|mango|banana))* matches 0 or more sequences of...
    • \s+ - 1 or more whitespace
    • (?:apple|mango|banana) - any of the alternatives.

Snippet:

var re = /^((?:apple|mango|banana)(?:\s+(?:apple|mango|banana))*) is (\d+)kg$/gm; 
var str = 'apple is 2kg\napple banana mango is 2kg\napple apple apple is 6kg\nbanana banana banana is 6kg';
var m;
 
while ((m = re.exec(str)) !== null) {
    document.write(m[1] + "," + m[2] + "<br/>");
}

document.write("<b>appleapple is 2kg</b> matched: " + 
     /^((?:apple|mango|banana)(?:\s+(?:apple|mango|banana))*) is (\d+)kg$/.test("appleapple is 2kg"));
like image 171
Wiktor Stribiżew Avatar answered Sep 18 '25 14:09

Wiktor Stribiżew