Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient regex to split ingredient measures and units

Background

Hand-crafted ingredient lists could resemble:

180-200g/6-7oz flour
3-5g sugar
6g to 7g sugar
2 1/2 tbsp flour
3/4 cup flour

Problem

The items must be normalized as follows:

180 to 200 g / 6 to 7 oz flour
3 to 5 g sugar
6 g to 7 g sugar
2 1/2 tbsp flour
3/4 cup flour

Code

Here is what I have so far:

text = text.replaceAll( "([0-9])-([0-9])", "$1 to $2" );
text = text.replaceAll( "([^0-9])/([0-9])", "$1 / $2" );
return text.replaceAll( "([0-9])([^0-9 /])", "$1 $2" );

Question

What is the most efficient regex to split the data?

Thank you!

like image 841
Dave Jarvis Avatar asked Dec 07 '25 13:12

Dave Jarvis


2 Answers

Here's a one-liner using nothing but look-arounds to insert a space:

text = text.replaceAll("(?=-)|(?<=-)|(?<=[^\\d ])(?=/)|(?<=\\d/?)(?=[^\\d /])|(?<=\\D/)(?=\\d)", " ");

This works for all your cases. Here's a some testing code:

public static void main(String[] args) {
    String[] inputs = { "180-200g/6-7oz flour", "3-5g sugar", "6g to 7g sugar", "2 1/2 tbsp flour", "3/4 cup flour" };
    String[] outputs = { "180 - 200 g / 6 - 7 oz flour", "3 - 5 g sugar", "6 g to 7 g sugar", "2 1/2 tbsp flour", "3/4 cup flour" };

    int i = 0;
    for (String input : inputs) {
        String output = input.replaceAll("(?=-)|(?<=-)|(?<=[^\\d ])(?=/)|(?<=\\d/?)(?=[^\\d /])|(?<=\\D/)(?=\\d)", " ");

        if (!output.equals(outputs[i++])) {
            System.out.println("Failed with input: " + input);
            System.out.println("Expected: " + outputs[i - 1]);
            System.out.println("  Actual: " + output);
        }
    }
}

Output is nothing, as expected.

If tests fail, this will help you see where it went wrong.

like image 175
Bohemian Avatar answered Dec 10 '25 02:12

Bohemian


You can use \b to insert spaces at word boundaries:

return text.replaceAll( "([0-9])-([0-9])",  "$1 to $2" )
           .replaceAll( "\\b", " ")
           .replaceAll( " {2,}", " ")
           .trim();
like image 36
Tomalak Avatar answered Dec 10 '25 01:12

Tomalak



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!