Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading a file larger than 2GB into memory in Java

Tags:

java

io

Since ByteArrayInputStream is limited to 2GB, is there any alternate solution that allows me to store the whole contents of a 2.3GB (and possibly larger) file into an InputStream to be read by Stax2?

Current code:

            XMLInputFactory xmlInputFactory = XMLInputFactory.newInstance();
            XMLStreamReader xmlStreamReader = xmlInputFactory.createXMLStreamReader(in); //ByteArrayInputStream????
            try
            {
                SchemaFactory factory = SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");

                Schema schema = factory.newSchema(new StreamSource(schemaInputStream));
                Validator validator = schema.newValidator();
                validator.validate(new StAXSource(xmlStreamReader));

            }

            finally
            {
                xmlStreamReader.close();
            }

For performance tuning, variable in must not come from disk. I have plenties of RAM.

like image 216
usr-local-ΕΨΗΕΛΩΝ Avatar asked Oct 01 '14 11:10

usr-local-ΕΨΗΕΛΩΝ


People also ask

How do you handle big data in Java?

Provide more memory to your JVM (usually using -Xmx / -Xms ) or don't load all the data into memory. For many operations on huge amounts of data there are algorithms which don't need access to all of it at once. One class of such algorithms are divide and conquer algorithms.

Which of the following is a memory efficient way to read file in Java?

Java BufferedReader() & FileReader() Implementation BufferedReader reads text from a character-input stream, buffering characters so as to provide for the efficient reading of characters, arrays, and lines, and it is wrapped around the FileReader method, which is the actual method reading the specified text file.


1 Answers

The whole point of StAX2 is that you do not need to read the file in to memory. You can just supply the source, and let the StAX StreamReader pull the data as it needs to.

What additional constraints do you have that you are not showing in your question?

If you have lots of memory, and you want to get good performance, just wrap your InputStream with a large byte buffer, and let the buffer do the buffering for you:

// 4 meg buffer on the stream
InputStream buffered = new BufferedInputStream(schemaInputStream, 1024 * 1024 * 4);

An alternative to solving this in Java is to create a RAMDisk, and to store the file on that, which would remove the problem from Java, where your basic limitation is that you can only have just less than Integer.MAX_VALUE values in a single array.

like image 193
rolfl Avatar answered Oct 16 '22 02:10

rolfl