I have noticed that using java.util.Scanner
is very slow when reading large files (in my case, CSV files).
I want to change the way I am currently reading files, to improve performance. Below is what I have at the moment. Note that I am developing for Android:
InputStreamReader inputStreamReader;
try {
inputStreamReader = new InputStreamReader(context.getAssets().open("MyFile.csv"));
Scanner inputStream = new Scanner(inputStreamReader);
inputStream.nextLine(); // Ignores the first line
while (inputStream.hasNext()) {
String data = inputStream.nextLine(); // Gets a whole line
String[] line = data.split(","); // Splits the line up into a string array
if (line.length > 1) {
// Do stuff, e.g:
String value = line[1];
}
}
inputStream.close();
} catch (IOException e) {
e.printStackTrace();
}
Using Traceview, I managed to find that the main performance issues, specifically are: java.util.Scanner.nextLine()
and java.util.Scanner.hasNext()
.
I've looked at other questions (such as this one), and I've come across some CSV readers, like the Apache Commons CSV, but they don't seem to have much information on how to use them, and I'm not sure how much faster they would be.
I have also heard about using FileReader
and BufferedReader
in answers like this one, but again, I do not know whether the improvements will be significant.
My file is about 30,000 lines in length, and using the code I have at the moment (above), it takes at least 1 minute to read values from about 600 lines down, so I have not timed how long it would take to read values from over 2,000 lines down, but sometimes, when reading information, the Android app becomes unresponsive and crashes.
Although I could simply change parts of my code and see for myself, I would like to know if there are any faster alternatives I have not mentioned, or if I should just use FileReader
and BufferedReader
. Would it be faster to split the huge file into smaller files, and choose which one to read depending on what information I want to retrieve? Preferably, I would also like to know why the fastest method is the fastest (i.e. what makes it fast).
uniVocity-parsers has the fastest CSV parser you'll find (2x faster than OpenCSV, 3x faster than Apache Commons CSV), with many unique features.
Here's a simple example on how to use it:
CsvParserSettings settings = new CsvParserSettings(); // many options here, have a look at the tutorial
CsvParser parser = new CsvParser(settings);
// parses all rows in one go
List<String[]> allRows = parser.parseAll(new FileReader(new File("your/file.csv")));
To make the process faster, you can select the columns you are interested in:
parserSettings.selectFields("Column X", "Column A", "Column Y");
Normally, you should be able to parse 4 million rows around 2 seconds. With column selection the speed will improve by roughly 30%.
It is even faster if you use a RowProcessor. There are many implementations out-of-the box for processing conversions to objects, POJOS, etc. The documentation explains all of the available features. It works like this:
// let's get the values of all columns using a column processor
ColumnProcessor rowProcessor = new ColumnProcessor();
parserSettings.setRowProcessor(rowProcessor);
//the parse() method will submit all rows to the row processor
parser.parse(new FileReader(new File("/examples/example.csv")));
//get the result from your row processor:
Map<String, List<String>> columnValues = rowProcessor.getColumnValuesAsMapOfNames();
We also built a simple speed comparison project here.
Your code is good to load big files. However, when an operation is going to be longer than you're expecting, it's good practice to execute it in a task and not in UI Thread, in order to prevent any lack of responsiveness.
The AsyncTask class help to do that:
private class LoadFilesTask extends AsyncTask<String, Integer, Long> {
protected Long doInBackground(String... str) {
long lineNumber = 0;
InputStreamReader inputStreamReader;
try {
inputStreamReader = new
InputStreamReader(context.getAssets().open(str[0]));
Scanner inputStream = new Scanner(inputStreamReader);
inputStream.nextLine(); // Ignores the first line
while (inputStream.hasNext()) {
lineNumber++;
String data = inputStream.nextLine(); // Gets a whole line
String[] line = data.split(","); // Splits the line up into a string array
if (line.length > 1) {
// Do stuff, e.g:
String value = line[1];
}
}
inputStream.close();
} catch (IOException e) {
e.printStackTrace();
}
return lineNumber;
}
//If you need to show the progress use this method
protected void onProgressUpdate(Integer... progress) {
setYourCustomProgressPercent(progress[0]);
}
//This method is triggered at the end of the process, in your case when the loading has finished
protected void onPostExecute(Long result) {
showDialog("File Loaded: " + result + " lines");
}
}
...and executing as:
new LoadFilesTask().execute("MyFile.csv");
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With