I have a java server app that download CSV file and parse it. The parsing can take from 5 to 45 minutes, and happens each hour.This method is a bottleneck of the app so it's not premature optimization. The code so far:
client.executeMethod(method);
InputStream in = method.getResponseBodyAsStream(); // this is http stream
String line;
String[] record;
reader = new BufferedReader(new InputStreamReader(in), 65536);
try {
// read the header line
line = reader.readLine();
// some code
while ((line = reader.readLine()) != null) {
// more code
line = line.replaceAll("\"\"", "\"NULL\"");
// Now remove all of the quotes
line = line.replaceAll("\"", "");
if (!line.startsWith("ERROR"){
//bla bla
continue;
}
record = line.split(",");
//more error handling
// build the object and put it in HashMap
}
//exceptions handling, closing connection and reader
Is there any existing library that would help me to speed up things? Can I improve existing code?
Have you seen Apache Commons CSV?
split
Bear in mind is that split
only returns a view of the data, meaning that the original line
object is not eligible for garbage collection whilst there is a reference to any of its views. Perhaps making a defensive copy will help? (Java bug report)
It also is not reliable in grouping escaped CSV columns containing commas
Take a look at opencsv.
This blog post, opencsv is an easy CSV parser, has example usage.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With