Consider these sentences:
apple is 2kg
apple banana mango is 2kg
apple apple apple is 6kg
banana banana banana is 6kg
Given that "apple", "banana", and "mango" are the only fruits, what would be the regex to extract the fruit name(s) that appear in the start of the sentence?
I wrote this regex (https://regex101.com/r/fY8bK1/1):
^(apple|mango|banana) is (\d+)kg$
but this only matches if a single fruit is in the sentence.
How do I extract all the fruit names?
The expected output, for all 4 sentences, should be:
apple, 2
apple banana mango, 2
apple apple apple, 6
banana banana banana, 6
You can use grouping like this:
^((?:apple|mango|banana)(?:\s+(?:apple|mango|banana))*) is (\d+)kg$
See regex demo
The (?:...)
is a non-capturing group inside a capturing ((...)
) group so as not to create a mess in the output.
The ((?:apple|mango|banana)(?:\s+(?:apple|mango|banana))*)
group matches:
(?:apple|mango|banana)
- any value from the alternative list delimited with alternation |
operator. If you plan to match whole words only, put \b
at both ends of the subpattern.(?:\s+(?:apple|mango|banana))*
matches 0 or more sequences of...
\s+
- 1 or more whitespace(?:apple|mango|banana)
- any of the alternatives.Snippet:
var re = /^((?:apple|mango|banana)(?:\s+(?:apple|mango|banana))*) is (\d+)kg$/gm;
var str = 'apple is 2kg\napple banana mango is 2kg\napple apple apple is 6kg\nbanana banana banana is 6kg';
var m;
while ((m = re.exec(str)) !== null) {
document.write(m[1] + "," + m[2] + "<br/>");
}
document.write("<b>appleapple is 2kg</b> matched: " +
/^((?:apple|mango|banana)(?:\s+(?:apple|mango|banana))*) is (\d+)kg$/.test("appleapple is 2kg"));
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With