Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Help on a better way to parses digits from a String in Java

I have a string which contains digits and letters. I wish to split the string into contiguous chunks of digits and contiguous chunks of letters.

Consider the String "34A312O5M444123A".

I would like to output: ["34", "A", "312", "O", "5", "M", "444123", "A"]

I have code which works and looks like:

List<String> digitsAsElements(String str){
  StringBuilder digitCollector = new StringBuilder();

  List<String> output = new ArrayList<String>();

  for (int i = 0; i < str.length(); i++){
    char cChar = str.charAt(i);

    if (Character.isDigit(cChar))
       digitCollector.append(cChar);
    else{
      output.add(digitCollector.toString());
      output.add(""+cChar);

      digitCollector = new StringBuilder();
    }         
  }

  return output;
}

I considered splitting str twice to get an array containing all the numbers chunks and an array containing the all letters chunks. Then merging the results. I shied away from this as it would harm readability.

I have intentionally avoided solving this with a regex pattern as I find regex patterns to be a major impediment to readability.

  • Debuggers don't handle them well.
  • They interrupt the flow of someone reading source code.
  • Overtime regex's grow organically and become monsters.
  • They are deeply non intuitive.

My questions are:

  • How could I improve the readability of the above code?
  • Is there a better way to do this? A Util class that solves this problem elegantly.
  • Where do you draw the line between using a regEx and coding something simpilar to what I've written above?
  • How do you increase the readability/maintainability of regExes?
like image 953
Ethan Heilman Avatar asked Jun 04 '09 19:06

Ethan Heilman


People also ask

What is the best way to read in a string Java?

readString() method was introduced in Java 11 and this method is used to read a file's content into String.


2 Answers

For this particular task I'd always use a regex instead of hand-writing something similar. The code you have given above is, at least to me, less readable than a simple regular expression (which would be (\d+|[^\d]+) in this case, as far as I can see).

You may want to avoid writing regular expressions that exceed a few lines. Those can be and usually are unreadable and hard to understand, but so is the code they can be replaced with! Parsers are almost never pretty and you're usually better off reading the original grammar than trying to make sense of the generated (or handwritten) parser. Same goes (imho) for regexes which are just a concise description of a regular grammar.

So, in general I'd say banning regexes in favor of code like you've given in your question sounds like a terribly stupid idea. And regular expressions are just a tool, nothing less, nothing more. If something else does a better job of text parsing (say, a real parser, some substring magic, etc.) then use it. But don't throw away possibilities just because you feel uncomfortable with them – others may have less problems coping with them and all people are able to learn.

EDIT: Updated regex after comment by mmyers.

like image 181
Joey Avatar answered Oct 21 '22 09:10

Joey


For a utility class, check out java.util.Scanner. There are a number of options in there as to how you might go about solving your problem. I have a few comments on your questions.

Debuggers don't handle them (regular expressions) well

Whether a regex works or not depends on whats in your data. There are some nice plugins you can use to help you build a regex, like QuickREx for Eclipse, does a debugger actually help you write the right parser for your data?

They interrupt the flow of someone reading source code.

I guess it depends on how comfortable you are with them. Personally, I'd rather read a reasonable regex than 50 more lines of string parsing code, but maybe that's a personal thing.

Overtime regex's grow organically and become monsters.

I guess they might, but that's probably a problem with the code they live in becoming unfocussed. If the complexity of the source data is increasing, you probably need to keep an eye on whether you need a more expressive solution (maybe a parser generator like ANTLR)

They are deeply non intuitive.

They're a pattern matching language. I would say they're pretty intuitive in that context.

How could I improve the readability of the above code?

Not sure, apart from use a regex.

Is there a better way to do this? A Util class that solves this problem elegantly.

Mentioned above, java.util.Scanner.

Where do you draw the line between using a regEx and coding something simpilar to what I've written above?

Personally I use regex for anything reasonably simple.

How do you increase the readability/maintainability of regExes?

Think carefully before extending,take extra care to comment up the code and the regex in detail so that it's clear what you're doing.

like image 7
brabster Avatar answered Oct 21 '22 09:10

brabster