Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Garbage friendly alternative to substring()

I have a pipe delimited file which I parse to get system options. The environment is sensitive to heap allocation and we are trying to avoid garbage collection.

Below is the code I am using to parse the pipe delimited string. This function is called about 35000 times. I was wondering if there was a better approach that doesn't create as much memory churn.

static int countFields(String s) {
    int n = 1;
    for (int i = 0; i < s.length(); i++)
        if (s.charAt(i) == '|')
            n++;

    return n;
}

static String[] splitFields(String s) {
    String[] l = new String[countFields(s)];

    for (int pos = 0, i = 0; i < l.length; i++) {
        int end = s.indexOf('|', pos);
        if (end == -1)
            end = s.length();
        l[i] = s.substring(pos, end);
        pos = end + 1;
    }

    return l;
}

EDIT 1, about java version:

For business reasons we are stuck at JDK 1.6.0_25.

EDIT 2 about String and String[] usage:

The String[] is used to perform the system setup logic. Basically, if String[0].equals("true") then enable debugging. That's the usage pattern

EDIT 3 about garbage collected objects:

The input String and the String[] are eventually GC'd. The input String is a single line from system setup file which is GC'd after the entire file is processed and the String[] is GC'd after the entire line has been processed.

EDIT - SOLUTION:

This is a combo of Peter Lawrey and zapl's solutions. Also, this class is NOT thread safe.

public class DelimitedString {

    private static final Field EMPTY = new Field("");

    private char delimiter = '|';
    private String line = null;
    private Field field = new Field();

    public DelimitedString() { }

    public DelimitedString(char delimiter) {
        this.delimiter = delimiter;
    }

    public void set(String line) {
        this.line = line;
    }

    public int length() {
        int numberOfFields = 0;
        if (line == null)
            return numberOfFields;

        int idx = line.indexOf(delimiter);
        while (idx >= 0) {
            numberOfFields++;
            idx = line.indexOf(delimiter, idx + 1);
        }
        return ++numberOfFields;
    }

    public Field get(int fieldIndex) {
        if (line == null)
            return EMPTY;

        int currentField = 0;
        int startIndex = 0;
        while (currentField < fieldIndex) {
            startIndex = line.indexOf(delimiter, startIndex);

            // not enough fields
            if (startIndex < 0)
                return EMPTY;

            startIndex++;
            currentField++;
        }

        int endIndex = line.indexOf(delimiter, startIndex);
        if (endIndex == -1)
            endIndex = line.length();

        fieldLength = endIndex - startIndex;
        if (fieldLength == 0)
            return EMPTY;

        // Populate field
        for (int i = 0; i < fieldLength; i++) {
            char c = line.charAt(startIndex + i);
            field.bytes[i] = (byte) c;
        }
        field.fieldLength = fieldLength;
        return field;
    }

    @Override
    public String toString() {
        return new String(line + " current field = " + field.toString());
    }

    public static class Field {

        // Max size of a field
        private static final int DEFAULT_SIZE = 1024;

        private byte[] bytes = null;
        private int fieldLength = Integer.MIN_VALUE;

        public Field() {
            bytes = new byte[DEFAULT_SIZE];
            fieldLength = Integer.MIN_VALUE;
        }

        public Field(byte[] bytes) {
            set(bytes);
        }

        public Field(String str) {
            set(str.getBytes());
        }

        public void set(byte[] str) {
            int len = str.length;
            bytes = new byte[len];
            for (int i = 0; i < len; i++) {
                byte b = str[i];
                bytes[i] = b;
            }
            fieldLength = len;
        }

        public char charAt(int i) {
            return (char) bytes[i];
        }

        public byte[] getBytes() {
            return bytes;
        }

        public int length() {
            return fieldLength;
        }

        public short getShort() {
            return (short) readLong();
        }

        public int getInt() {
            return (int) readLong();
        }

        public long getLong() {
            return readLong();
        }

        @Override
        public String toString() {
            return (new String(bytes, 0, fieldLength));
        }

        // Code taken from Java class Long method parseLong()
        public long readLong() {
            int radix = 10;
            long result = 0;
            boolean negative = false;
            int i = 0, len = fieldLength;
            long limit = -Long.MAX_VALUE;
            long multmin;
            int digit;

            if (len > 0) {
                char firstChar = (char) bytes[0];
                if (firstChar < '0') { // Possible leading "-"
                    if (firstChar == '-') {
                        negative = true;
                        limit = Long.MIN_VALUE;
                    } else
                        throw new NumberFormatException("Invalid leading character.");

                    if (len == 1) // Cannot have lone "-"
                        throw new NumberFormatException("Negative sign without trailing digits.");
                    i++;
                }
                multmin = limit / radix;
                while (i < len) {
                    // Accumulating negatively avoids surprises near MAX_VALUE
                    digit = Character.digit(bytes[i++], radix);
                    if (digit < 0)
                        throw new NumberFormatException("Single digit is less than zero.");
                    if (result < multmin)
                        throw new NumberFormatException("Result is less than limit.");

                    result *= radix;
                    if (result < limit + digit)
                        throw new NumberFormatException("Result is less than limit plus new digit.");

                    result -= digit;
                }
            } else {
                throw new NumberFormatException("Called readLong with a length <= 0. len=" + len);
            }
            return negative ? result : -result;
        }
    }
}
like image 387
Justin Avatar asked Dec 02 '13 19:12

Justin


4 Answers

I would do something like this.

public static void main(String[] args) throws IOException {
    BufferedReader br = new BufferedReader(new FileReader("inputfile"));
    StringBuilder sb = new StringBuilder();
    do {
        boolean flag = readBoolean(br, sb);
        long val = readLong(br, sb);
        process(flag, val);
    } while (nextLine(br));
    br.close();
}

private static void process(boolean flag, long val) {
    // do something.
}

public static boolean readBoolean(BufferedReader br, StringBuilder sb) throws IOException {
    readWord(br, sb);
    return sb.length() == 4
            && sb.charAt(0) == 't'
            && sb.charAt(1) == 'r'
            && sb.charAt(2) == 'u'
            && sb.charAt(3) == 'e';
}

public static long readLong(BufferedReader br, StringBuilder sb) throws IOException {
    readWord(br, sb);
    long val = 0;
    boolean neg = false;
    for (int i = 0; i < sb.length(); i++) {
        char ch = sb.charAt(i);
        if (ch == '-')
            neg = !neg;
        else if (ch >= '0' && ch <= '9')
            val = val * 10 + ch - '0';
        else
            throw new NumberFormatException();
    }
    return neg ? -val : val;
}

public static boolean nextLine(BufferedReader br) throws IOException {
    while (true) {
        int ch = br.read();
        if (ch < 0) return false;
        if (ch == '\n') return true;
    }
}

public static void readWord(BufferedReader br, StringBuilder sb) throws IOException {
    sb.setLength(0);
    while (true) {
        br.mark(1);
        int ch = br.read();
        switch (ch) {
            case -1:
                throw new EOFException();
            case '\n':
                br.reset();
            case '|':
                return;
            default:
                sb.append((char) ch);
        }
    }
}

This is waaaaay more complicated, but creates very little garbage. In fact the StringBuilder can be recycled. ;)

Note: this doesn't create a String or a String[]

like image 58
Peter Lawrey Avatar answered Oct 06 '22 01:10

Peter Lawrey


Basically, if String[0].equals("true") then enable debugging.

You can get rid of the creation of the array and substrings by comparing directly to the input string. That's not avoiding to create the input string like Peter Lawrey's solution does but could be less work to change (although I doubt that).

public static boolean fieldMatches(String line, int fieldIndex, String other) {
    int currentField = 0;
    int startIndex = 0;
    while (currentField < fieldIndex) {
        startIndex = line.indexOf('|', startIndex);

        // not enough fields
        if (startIndex < 0)
            return false;

        startIndex++;
        currentField++;
    }

    int start = startIndex;
    int end = line.indexOf('|', startIndex);
    if (end == -1) {
        end = line.length();
    }
    int fieldLength = end - start;

    // make sure both strings have the same length
    if (fieldLength != other.length())
        return false;

    // regionMatches does not allocate objects
    return line.regionMatches(start, other, 0, fieldLength);
}

public static void main(String[] args) {
    String line = "Config|true"; // from BufferedReader
    System.out.println(fieldMatches(line, 0, "Config"));
    System.out.println(fieldMatches(line, 1, "true"));
    System.out.println(fieldMatches(line, 1, "foobar"));
    System.out.println(fieldMatches(line, 2, "thereisnofield"));
}

Output

true
true
false
false
like image 31
zapl Avatar answered Oct 06 '22 01:10

zapl


Just an idea. Don't split anything. Do the opposite - append them (e.g. in some StringBuilder) and keep them in one big String or actually StringBuilder would be better. Strings in it can be separated by | and (what's currently) String arrays with say #.

Then just return indices from the method splitFields - the start index and the end index (of what's currently your String[]).

Just throwing out some idea here. Not sure of your exact use cases scenario, depends what you do with the return value you get back.

Of course, you will need to manage that big StringBuilder yourself and remove data from it when you don't need that data any more, otherwise it will eventually grow too large.

Even if this idea is not directly applicable, I hope you get my point - I think you need some pool or memory area or something like that which you would manage yourself.

like image 22
peter.petrov Avatar answered Oct 06 '22 01:10

peter.petrov


Since the biggest "offenders" here are the strings created as the result of parsing the input, followed by arrays of strings created once per call, and because it appears from one of your comments that you do not need all substrings at once, you could make an object that feeds you strings one at a time, reusing the same StringBuilder object as it goes.

Here is a skeletal class showing how this can be done:

class Splitter {
    private String s="";
    private int pos = 0;
    public void setString(String newS) {
        s = newS;
        pos = 0;
    }
    boolean tryGetNext(StringBuilder result) {
        result.delete(0, result.length());
        // Check if we have anything to return
        if (pos == s.length()) {
            return false;
        }
        // Go through the string starting at pos, adding characters to result
        // until you hit the pipe '|'
        // At that point stop and return
        while (...) {
            ...
        }
        return true;
    }
}

Now you can use this class as follows:

StringBuilder sb = new StringBuilder(MAX_LENGTH);
Splitter splitter = new Splitter();
for (String s: sourceOfStringsToBeSplit) {
    sb.setString(s);
    while (splitter.tryGetNext(sb)) {
        ... // Use the string from sb
    }
}

If the sizing of the internal buffer of sb is done correctly, this code will create three objects during the entire run - the Splitter, the StringBuilder, and the character array inside the StringBuilder. You need to be careful when you use StringBuilder to not create additional objects, though - specifically, you need to avoid calling toString() on it. For example, instead of comparing the StringBuilder for equality to a fixed string like this

if (sb.toString().equals(targetString)) {
    ...
}

you should write

if (targetString.length() == sb.length() && sb.indexOf(targetString) == 0) {
    ...
}
like image 30
Sergey Kalinichenko Avatar answered Oct 05 '22 23:10

Sergey Kalinichenko