Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Advanced parsing of numeric ranges from string

Tags:

java

regex

I'm using Java to parse strings input by the user, representing either single numeric values or ranges. The user can input the following string:

10-19

And his intention is to use whole numbers from 10-19 --> 10,11,12...19

The user can also specify a list of numbers:

10,15,19

Or a combination of the above:

10-19,25,33

Is there a convenient method, perhaps based on regular expressions, to perform this parsing? Or must I split the string using String.split(), then manually iterate the special signs (',' and '-' in this case)?

like image 890
dzisner Avatar asked Oct 22 '12 12:10

dzisner


2 Answers

This is how I would go about it:

  1. Split using the , as a delimiter.
  2. If it matches this regular expression: ^(\\d+)-(\\d+)$, then I know I have a range. I would then extract the numbers and create my range (it might be a good idea to make sure that the first digit is lower than the second digit, because you never know...). You then act accordingly.
  3. If it matches this regular expression: ^\\d+$ I would know I have only 1 number, so I have a specific page. I would then act accordingly.
like image 66
npinti Avatar answered Sep 23 '22 15:09

npinti


This tested (and fully commented) regex solution meets the OP requirements:

Java regex solution

// TEST.java 20121024_0700
import java.util.regex.*;
public class TEST {
    public static Boolean isValidIntRangeInput(String text) {
        Pattern re_valid = Pattern.compile(
            "# Validate comma separated integers/integer ranges.\n" +
            "^             # Anchor to start of string.         \n" +
            "[0-9]+        # Integer of 1st value (required).   \n" +
            "(?:           # Range for 1st value (optional).    \n" +
            "  -           # Dash separates range integer.      \n" +
            "  [0-9]+      # Range integer of 1st value.        \n" +
            ")?            # Range for 1st value (optional).    \n" +
            "(?:           # Zero or more additional values.    \n" +
            "  ,           # Comma separates additional values. \n" +
            "  [0-9]+      # Integer of extra value (required). \n" +
            "  (?:         # Range for extra value (optional).  \n" +
            "    -         # Dash separates range integer.      \n" +
            "    [0-9]+    # Range integer of extra value.      \n" +
            "  )?          # Range for extra value (optional).  \n" +
            ")*            # Zero or more additional values.    \n" +
            "$             # Anchor to end of string.           ", 
            Pattern.COMMENTS);
        Matcher m = re_valid.matcher(text);
        if (m.matches())    return true;
        else                return false;
    }
    public static void printIntRanges(String text) {
        Pattern re_next_val = Pattern.compile(
            "# extract next integers/integer range value.    \n" +
            "([0-9]+)      # $1: 1st integer (Base).         \n" +
            "(?:           # Range for value (optional).     \n" +
            "  -           # Dash separates range integer.   \n" +
            "  ([0-9]+)    # $2: 2nd integer (Range)         \n" +
            ")?            # Range for value (optional). \n" +
            "(?:,|$)       # End on comma or string end.", 
            Pattern.COMMENTS);
        Matcher m = re_next_val.matcher(text);
        String msg;
        int i = 0;
        while (m.find()) {
            msg = "  value["+ ++i +"] ibase="+ m.group(1);
            if (m.group(2) != null) {
                msg += " range="+ m.group(2);
            };
            System.out.println(msg);
        }
    }
    public static void main(String[] args) {
        String[] arr = new String[] 
                { // Valid inputs:
                    "1", 
                    "1,2,3",
                    "1-9",
                    "1-9,10-19,20-199",
                    "1-8,9,10-18,19,20-199",
                  // Invalid inputs:
                    "A", 
                    "1,2,",
                    "1 - 9",
                    " ",
                    ""
                };
        // Loop through all test input strings:
        int i = 0;
        for (String s : arr) {
            String msg = "String["+ ++i +"] = \""+ s +"\" is ";
            if (isValidIntRangeInput(s)) {
                // Valid input line
                System.out.println(msg +"valid input. Parsing...");
                printIntRanges(s);
            } else {
                // Match attempt failed
                System.out.println(msg +"NOT valid input.");
            } 
        }
    }
}

Output:

r'''
String[1] = "1" is valid input. Parsing...
  value[1] ibase=1
String[2] = "1,2,3" is valid input. Parsing...
  value[1] ibase=1
  value[2] ibase=2
  value[3] ibase=3
String[3] = "1-9" is valid input. Parsing...
  value[1] ibase=1 range=9
String[4] = "1-9,10-19,20-199" is valid input. Parsing...
  value[1] ibase=1 range=9
  value[2] ibase=10 range=19
  value[3] ibase=20 range=199
String[5] = "1-8,9,10-18,19,20-199" is valid input. Parsing...
  value[1] ibase=1 range=8
  value[2] ibase=9
  value[3] ibase=10 range=18
  value[4] ibase=19
  value[5] ibase=20 range=199
String[6] = "A" is NOT valid input.
String[7] = "1,2," is NOT valid input.
String[8] = "1 - 9" is NOT valid input.
String[9] = " " is NOT valid input.
String[10] = "" is NOT valid input.
'''

Note that this solution simply demonstrates how to validate an input line and how to parse/extract value components from each line. It does not further validate that for range values the second integer is larger than the first. This logic check however, could be easily added.

Edit:2012-10-24 07:00 Fixed index i to count from zero.

like image 34
ridgerunner Avatar answered Sep 25 '22 15:09

ridgerunner