I've noticed that Java String will reuse char array inside it to avoid creating new char array for a new String instance in method such as subString(). There are several unpublish constructors in String for this purpose, accepting a char array and two int as range to construct a String instance.
But until today I found that split will also reuse the char arr of original String instance. Now I read a loooooong line from a file, split it with "," and cut a very limit column for real usage. Because every part of the line secretly holding the reference of the looooong char array, I got an OOO very soon.
here is example code:
ArrayList<String> test = new ArrayList<String>(3000000);
BufferedReader origReader = new BufferedReader(new FileReader(new File(
"G:\\filewithlongline.txt")));
String line = origReader.readLine();
int i = 0;
while ((line = origReader.readLine()) != null) {
String name = line.split(',')[0];
test.add(name);
i++;
if (i % 100000 == 0) {
System.out.println(name);
}
}
System.out.println(test.size());
Is there any standard method in JDK to make sure that every String instance that spitted is a "real deep copy" not "shallow copy"?
Now I am using a very ugly workaround to force creating a new String instance:
ArrayList<String> test = new ArrayList<String>(3000000);
BufferedReader origReader = new BufferedReader(new FileReader(new File(
"G:\\filewithlongline.txt")));
String line = origReader.readLine();
int i = 0;
while ((line = origReader.readLine()) != null) {
String name = line.split(',')[0]+" ".trim(); // force creating a String instance
test.add(name);
i++;
if (i % 100000 == 0) {
System.out.println(name);
}
}
System.out.println(test.size());
The simplest approach is to create a new String directly. This is one of the rare cases where its a good idea.
String name = new String(line.split(",")[0]); // note the use of ","
An alternative is to parse the file yourself.
do {
StringBuilder name = new StringBuilder();
int ch;
while((ch = origReader.read()) >= 0 && ch != ',' && ch >= ' ') {
name.append((char) ch);
}
test.add(name.toString());
} while(origReader.readLine() != null);
String has a copy constructor you can use for this purpose.
final String name = new String(line.substring(0, line.indexOf(',')));
... or, as Peter suggested, just only read until the ,.
final StringBuilder buf = new StringBuilder();
do {
int ch;
while ((ch = origReader.read()) >= 0 && ch != ',') {
buf.append((char) ch);
}
test.add(buf.toString());
buf.setLength(0);
} while (origReader.readLine() != null);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With