I'm running a simple scanner to parse a string, however I've discovered that if called often enough I get OutOfMemory errors. This code is called as part of the constructor of an object that is built repeatedly for an array of strings :
Edit: Here's the constructor for more infos; not much more happening outside of the try-catch regarding the Scanner
public Header(String headerText) {
char[] charArr;
charArr = headerText.toCharArray();
// Check that all characters are printable characters
if (charArr.length > 0 && !commonMethods.isPrint(charArr)) {
throw new IllegalArgumentException(headerText);
}
// Check for header suffix
Scanner sc = new Scanner(headerText);
MatchResult res;
try {
sc.findInLine("(\\D*[a-zA-Z]+)(\\d*)(\\D*)");
res = sc.match();
} finally {
sc.close();
}
if (res.group(1) == null || res.group(1).isEmpty()) {
throw new IllegalArgumentException("Missing header keyword found"); // Empty header to store
} else {
mnemonic = res.group(1).toLowerCase(); // Store header
}
if (res.group(2) == null || res.group(2).isEmpty()) {
suffix = -1;
} else {
try {
suffix = Integer.parseInt(res.group(2)); // Store suffix if it exists
} catch (NumberFormatException e) {
throw new NumberFormatException(headerText);
}
}
if (res.group(3) == null || res.group(3).isEmpty()) {
isQuery= false;
} else {
if (res.group(3).equals("?")) {
isQuery = true;
} else {
throw new IllegalArgumentException(headerText);
}
}
// If command was of the form *ABC, reject suffixes and prefixes
if (mnemonic.contains("*")
&& suffix != -1) {
throw new IllegalArgumentException(headerText);
}
}
A profiler memory snapshot shows the read(Char) method of Scanner.findInLine() to be allocated massive amounts of memory during operation as a I scan through a few hundred thousands strings; after a few seconds it already is allocated over 38MB.
I would think that calling close() on the scanner after using it in the constructor would flag the old object to be cleared by the GC, but somehow it remains and the read method accumulates gigabytes of data before filling the heap.
Can anybody point me in the right direction?
You haven't posted all your code, but given that you are scanning for the same regex repeatedly, it would be much more efficient to compile a static Pattern
beforehand and use this for the scanner's find:
static Pattern p = Pattern.compile("(\\D*[a-zA-Z]+)(\\d*)(\\D*)");
and in the constructor:
sc.findInLine(p);
This may or may not be the source of the OOM issue, but it will definitely make your parsing a bit faster.
Related: java.util.regex - importance of Pattern.compile()?
Update: after you posted more of your code, I see some other issues. If you're calling this constructor repeatedly, it means you are probably tokenizing or breaking up the input beforehand. Why create a new Scanner
to parse each line? They are expensive; you should be using the same Scanner
to parse the entire file, if possible. Using one Scanner
with a precompiled Pattern
will be much faster than what you are doing now, which is creating a new Scanner
and a new Pattern
for each line you are parsing.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With