Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java regular expression offers any performance benefit?

In Java, when we try to do pattern matching using a regular expression. e.g. take a input string and use regular expression to find out if it is numeric. If not, throw an exception. In this case, I understand, using regex makes the code less verbose than if we were to take each character of the string, check if it is a number and if not throw an exception.

But I was under the assumption that regex also makes the process more efficient. IS this true? I cannot find any evidence on this point. How is regex doing the match behind the scenes? IS it not also iterating over the string and checking each character one by one?

like image 933
Victor Avatar asked Aug 09 '12 01:08

Victor


People also ask

Does regex affect performance?

Being more specific with your regular expressions, even if they become much longer, can make a world of difference in performance. The fewer characters you scan to determine the match, the faster your regexes will be.

What are the benefits of regex regular expressions?

A Regular Expression is used for identifying a search pattern in a text string. It also helps in finding out the correctness of the data and even operations such as finding, replacing and formatting the data is possible using Regular Expressions.

How efficient is regex Java?

Regex definitely performs better than String based operations. Java regex engine uses efficient algorithms for finding matches, whereas String. substring creates a new copy of the original String on each call which comparatively performs less if invoked repeatedly.

Is regex fast in Java?

Regex is faster for large string than an if (perhaps in a for loops) to check if anything matches your requirement. If you are using regex as to match very small text and small pattern and don't do it because the matcher function . find() is slower than a normal if statement of a switch statement.


2 Answers

Just for fun, I have run this micro benchmark. The results of the last run (i.e. post JVM warm up / JIT) are below (results are fairly consistent from one run to another anyway):

regex with numbers 123
chars with numbers 33
parseInt with numbers 33
regex with words 123
chars with words 34
parseInt with words 733

In other words, chars is very efficient, Integer.parseInt is as efficient as char IF the string is a number, but awfully slow if the string is not a number. Regex is in between.

Conclusion

If you parse a string into a number and you expect the string to be a number in general, using Integer.parseInt is the best solution (efficient and readable). The penalty you get when the string is not a number should be low if it is not too frequent.

ps: my regex is maybe not optimal, feel free to comment.

public class TestNumber {

    private final static List<String> numbers = new ArrayList<>();
    private final static List<String> words = new ArrayList<>();

    public static void main(String args[]) {
        long start, end;
        Random random = new Random();

        for (int i = 0; i < 1000000; i++) {
            numbers.add(String.valueOf(i));
            words.add(String.valueOf(i) + "x");
        }

        for (int i = 0; i < 5; i++) {
            start = System.nanoTime();
            regex(numbers);
            System.out.println("regex with numbers " + (System.nanoTime() - start) / 1000000);
            start = System.nanoTime();
            chars(numbers);
            System.out.println("chars with numbers " + (System.nanoTime() - start) / 1000000);
            start = System.nanoTime();
            exception(numbers);
            System.out.println("exceptions with numbers " + (System.nanoTime() - start) / 1000000);

            start = System.nanoTime();
            regex(words);
            System.out.println("regex with words " + (System.nanoTime() - start) / 1000000);
            start = System.nanoTime();
            chars(words);
            System.out.println("chars with words " + (System.nanoTime() - start) / 1000000);
            start = System.nanoTime();
            exception(words);
            System.out.println("exceptions with words " + (System.nanoTime() - start) / 1000000);
        }
    }

    private static int regex(List<String> list) {
        int sum = 0;
        Pattern p = Pattern.compile("[0-9]+");
        for (String s : list) {
            sum += (p.matcher(s).matches() ? 1 : 0);
        }
        return sum;
    }

    private static int chars(List<String> list) {
        int sum = 0;

        for (String s : list) {
            boolean isNumber = true;
            for (char c : s.toCharArray()) {
                if (c < '0' || c > '9') {
                    isNumber = false;
                    break;
                }
            }
            if (isNumber) {
                sum++;
            }
        }
        return sum;
    }

    private static int exception(List<String> list) {
        int sum = 0;

        for (String s : list) {
            try {
                Integer.parseInt(s);
                sum++;
            } catch (NumberFormatException e) {
            }
        }
        return sum;
    }
}
like image 96
assylias Avatar answered Oct 06 '22 07:10

assylias


I don't have a technical answer yet, but I could write some code and see. I don't think that regular expressions would be the way to go for converting a string to a number. In many instances they can be more efficient, but if its written poorly it'll be slow.

May I ask however, why aren't you just using: Integer.parseInt("124")? That will throw a NumberFormatException. Should be able to handle it, and it leaves the detection of a number up to core Java.

like image 32
Michael Avatar answered Oct 06 '22 08:10

Michael