I'm trying to load large CSV formatted files (typically 200-600mb) efficiently with Java (less memory and as fast as possible access). Currently, the program is utilizing a List of String Arrays. This operation was previously handled with a Lua program using a table for each CSV row and a table to hold each "row" table.
Below is an example of the memory differences and load times:
If I remember correctly, duplicate items in a Lua table exist as a reference to the actual value. I suspect in the Java example, the List is holding separate copies of each duplicate value and that may be related to the larger memory usage.
Below is some background on the data within the CSV files:
Below are some examples of what may be required of the loaded data:
My question - Is there a collection that will require less memory to hold the data yet still offer features to easily and quickly search/sort the data?
One easy solution. You can have some HashMap
were you will put references to all unique strings.
And in ArrayList
you will just have reference to existing unique strings in HashMap
.
Something like :
private HashMap<String, String> hashMap = new HashMap<String, String>();
public String getUniqueString(String ns) {
String oldValue = hashMap.get(ns);
if (oldValue != null) { //I suppose there will be no null strings inside csv
return oldValue;
}
hashMap.put(ns, ns);
return ns;
}
Simple usage:
List<String> s = Arrays.asList("Pera", "Zdera", "Pera", "Kobac", "Pera", "Zdera", "rus");
List<String> finS = new ArrayList<String>();
for (String er : s) {
String ns = a.getUniqueString(er);
finS.add(ns);
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With