Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Android Inserting words into ArrayList, out of memory

I have two files, a dictionary containing words length 3 to 6 and a dictionary containing words 7. The words are stored in textfile separated with newlines. This method loads the file and inserts it into an arraylist which I store in an application class.

The file sizes are 386KB and 380 KB and contain less than 200k words each.

private void loadDataIntoDictionary(String filename) throws Exception {
    Log.d(TAG, "loading file: " + filename);
    AssetFileDescriptor descriptor = getAssets().openFd(filename);
    FileReader fileReader = new FileReader(descriptor.getFileDescriptor());
    BufferedReader bufferedReader = new BufferedReader(fileReader);
    String word = null;

    int i = 0;

    MyApp appState = ((MyApp)getApplicationContext());

    while ((word = bufferedReader.readLine()) != null) {
        appState.addToDictionary(word);
        word = null;
        i++;
    }
    Log.d(TAG, "added " + i + " words to the dictionary");

    bufferedReader.close();
}

The program crashes on an emulator running 2.3.3 with a 64MB sd card. The errors being reported using logcat. The heap grows past 24 MB. I then see clamp target GC heap from 25.XXX to 24.000 MB.

GC_FOR_MALLOC freed 0K, 12% free, external 1657k/2137K, paused 208ms.
GC_CONCURRENT freed XXK, 14% free
Out of memory on a 24-byte allocation and then FATAL EXCEPTION, memory exhausted.

How can I load these files without getting such a large heap?

Inside MyApp:

private ArrayList<String> dictionary = new ArrayList<String>();
public void addToDictionary(String word) {
    dictionary.add(word);
}
like image 308
user1781570 Avatar asked Nov 13 '22 19:11

user1781570


1 Answers

Irrespective of any other problems/bugs, ArrayList can be very wasteful for this kind of storage, because as a growing ArrayList runs out of space, it doubles the size of its underlying storage array. So it's possible that nearly half of your storage is wasted. If you can pre-size a storage array or ArrayList to the correct size, then you may get significant saving.

Also (with paranoid data-cleansing hat on) make sure that there's no extra whitespace in your input files - you can use String.trim() on each word if necessary, or clean up the input files first. But I don't think this can be a significant problem given the file sizes you mention.

I'd expect your inputs to take less than 2MB to store the text itself (remember that Java uses UTF-16 internally, so would typically take 2 bytes per character) but there's maybe 1.5MB overhead for the String object references, plus 1.5MB overhead for the String lengths, and possibly the same again and again for the offset and hashcode (take a look at String.java)... whilst 24MB of heap still sounds a little excessive, it's not far off if you are getting the near-doubling effect of an unlucky ArrayList re-size.

In fact, rather than speculate, how about a test? The following code, run with -Xmx24M gets to about 560,000 6-character Strings before stalling (on a Java SE 7 JVM, 64-bit). It eventually crawls up to around 580,000 (with much GC thrashing, I imagine).

    ArrayList<String> list = new ArrayList<String>();
    int x = 0;
    while (true)
    {
        list.add(new String("123456"));
        if (++x % 1000 == 0) System.out.println(x);
    }

So I don't think there's a bug in your code - storing large numbers of small Strings is just not very efficient in Java - for the test above it takes over 7 bytes per character because of all the overheads (which may differ between 32-bit and 64-bit machines, incidentally, and depend on JVM settings too)!

You might get slightly better results by storing an array of byte arrays, rather than ArrayList of Strings. There are also more efficient data structures for storing strings, such as Tries.

like image 122
DNA Avatar answered Nov 15 '22 10:11

DNA