Java

Question

I'm trying to load large CSV formatted files (typically 200-600mb) efficiently with Java (less memory and as fast as possible access). Currently, the program is utilizing a List of String Arrays. This operation was previously handled with a Lua program using a table for each CSV row and a table to hold each "row" table.

Below is an example of the memory differences and load times:

CSV File - 232mb
Lua - 549mb in memory - 157 seconds to load
Java - 1,378mb in memory - 12 seconds to load

If I remember correctly, duplicate items in a Lua table exist as a reference to the actual value. I suspect in the Java example, the List is holding separate copies of each duplicate value and that may be related to the larger memory usage.

Below is some background on the data within the CSV files:

Each field consists of a String
Specific fields within each row may include one of a set of Strings (E.g. field 3 could be "red", "green", or "blue").
There are many duplicate Strings within the content.

Below are some examples of what may be required of the loaded data:

Search through all Strings attempting to match with a given String and return the matching Strings
Display matches in a GUI table (sort able via fields).
Alter or replace Strings.

My question - Is there a collection that will require less memory to hold the data yet still offer features to easily and quickly search/sort the data?

Igor · Accepted Answer

One easy solution. You can have some HashMap were you will put references to all unique strings. And in ArrayList you will just have reference to existing unique strings in HashMap.

Something like :

private HashMap<String, String> hashMap = new HashMap<String, String>();

public String getUniqueString(String ns) {
   String oldValue = hashMap.get(ns);
   if (oldValue != null) { //I suppose there will be no null strings inside csv
    return oldValue;
   }        
   hashMap.put(ns, ns);
   return ns;
}

Simple usage:

List<String> s = Arrays.asList("Pera", "Zdera", "Pera", "Kobac", "Pera", "Zdera", "rus");
List<String> finS = new ArrayList<String>();
for (String er : s) {
   String ns = a.getUniqueString(er);
   finS.add(ns);
}

Java - how to efficiently store a large amount of String arrays

Tags:

csv

lua

user1816198

1 Answers

Igor

Recent Activity

Donate For Us