Efficient parsing of integers from substrings in Java

Question

AFAIK there is no efficient way in the standard Java libraries to parse an integer from a substring without actually newing up a new string containing the substring.

I'm in a situation where I'm parsing millions of integers from strings, and I don't particularly want to create new strings for every substring. The copying is overhead I don't need.

Given a string s, I'd like a method like:

parseInteger(s, startOffset, endOffset)

with semantics like:

Integer.parseInt(s.substring(startOffset, endOffset))

Now, I know I can write this reasonably trivially like this:

public static int parse(String s, int start, int end) {
    long result = 0;
    boolean foundMinus = false;

    while (start < end) {
        char ch = s.charAt(start);
        if (ch == ' ')
            /* ok */;
        else if (ch == '-') {
            if (foundMinus)
                throw new NumberFormatException();
            foundMinus = true;
        } else if (ch < '0' || ch > '9')
            throw new NumberFormatException();
        else
            break;
        ++start;
    }

    if (start == end)
        throw new NumberFormatException();

    while (start < end) {
        char ch = s.charAt(start);
        if (ch < '0' || ch > '9')
            break;
        result = result * 10 + (int) ch - (int) '0';
        ++start;
    }

    while (start < end) {
        char ch = s.charAt(start);
        if (ch != ' ')
            throw new NumberFormatException();
        ++start;
    }
    if (foundMinus)
        result *= -1;
    if (result < Integer.MIN_VALUE || result > Integer.MAX_VALUE)
        throw new NumberFormatException();
    return (int) result;
}

But that's not the point. I'd rather get this from a tested, supported third-party library. For example, parsing longs and dealing properly with Long.MIN_VALUE is slightly subtle, and I cheat above by parsing ints into longs. And the above still has an overflow issue if the parsed integer is bigger than Long.MAX_VALUE.

Is there any such library?

My searching has turned up little.

Dariusz · Accepted Answer

Have you profiled your app? Have you located the source of your problem?

Since Strings are immutable, there is a good chance that very little memory is requierd and very few operations are performed to create a substring.

Unless you are really experiencing problems with memory, garbage collection, etc. just use the substring method. Don't seek complex solutions to problems you do not have.

Besides: if you implement something on your own, you may lose more than you gain in terms of efficiency. Your code does a lot and is quite complex - as for the default implementation, however, you may be quite certain that it is relatively fast. And error-free.

Thomas · Answer

Don't worry too much about the objects if you do not experience actual performance problems. Use a current JVM, there are permanent improvements in regard to performance and memory overhead.

You can have a look at the "ByteString" from Google protocol buffers if you want to have a substring sharing the underlying string:

https://developers.google.com/protocol-buffers/docs/reference/java/com/google/protobuf/ByteString#substring%28int,%20int%29

Fernando Miguélez · Answer

I could not resist to measure the improvement of your method:

package test;

public class TestIntParse {

    static final int MAX_NUMBERS = 10000000;
    static final int MAX_ITERATIONS = 100;

    public static void main(String[] args) {
        long timeAvoidNewStrings = 0;
        long timeCreateNewStrings = 0;

        for (int i = 0; i < MAX_ITERATIONS; i++) {
            timeAvoidNewStrings += test(true);
            timeCreateNewStrings += test(false);
        }

        System.out.println("Average time method 'AVOID new strings': " + (timeAvoidNewStrings / MAX_ITERATIONS) + " ms");
        System.out.println("Average time method 'CREATE new strings': " + (timeCreateNewStrings / MAX_ITERATIONS) + " ms");
    }

    static long test(boolean avoidStringCreation) {
        long start = System.currentTimeMillis();

        for (int i = 0; i < MAX_NUMBERS; i++) {
            String value = Integer.toString((int) Math.random() * 100000);
            int intValue = avoidStringCreation ? parse(value, 0, value.length()) : parse2(value, 0, value.length());
            String value2 = Integer.toString(intValue);
            if (!value2.equals(value)) {
                System.err.println("Error at iteration " + i + (avoidStringCreation ? " without" : " with") + " string creation: " + value + " != " + value2);
            }
        }

        return System.currentTimeMillis() - start;
    }

    public static int parse2(String s, int start, int end) {
        return Integer.valueOf(s.substring(start, end));
    }

    public static int parse(String s, int start, int end) {
        long result = 0;
        boolean foundMinus = false;

        while (start < end) {
            char ch = s.charAt(start);
            if (ch == ' ')
                /* ok */;
            else if (ch == '-') {
                if (foundMinus)
                    throw new NumberFormatException();
                foundMinus = true;
            } else if (ch < '0' || ch > '9')
                throw new NumberFormatException();
            else
                break;
            ++start;
        }

        if (start == end)
            throw new NumberFormatException();

        while (start < end) {
            char ch = s.charAt(start);
            if (ch < '0' || ch > '9')
                break;
            result = result * 10 + ch - '0';
            ++start;
        }

        while (start < end) {
            char ch = s.charAt(start);
            if (ch != ' ')
                throw new NumberFormatException();
            ++start;
        }
        if (foundMinus)
            result *= -1;
        if (result < Integer.MIN_VALUE || result > Integer.MAX_VALUE)
            throw new NumberFormatException();
        return (int) result;
    }

}

The results:

Average time method 'AVOID new strings': 432 ms
Average time method 'CREATE new strings': 500 ms

Your method is roughly 14% more efficient in time and supposedly in memory, though quite more complex (and error prone). From my point of view your approach does not pay off, though might do in your case.

Efficient parsing of integers from substrings in Java

Tags:

java

string

int

parsing

Barry Kelly

3 Answers

Dariusz

Thomas

Fernando Miguélez

Recent Activity

Donate For Us

Efficient parsing of integers from substrings in Java

Tags:

java

string

int

parsing

Barry Kelly

3 Answers

Dariusz

Thomas

Fernando Miguélez

Related questions

Recent Activity

Donate For Us