How to read large file from Amazon S3?

Tags:

I have a program which will read a textfile from Amazon s3, but the file is around 400M. I have increased my Heap size but i'm still getting the Java Heap Size error. So, I'm not sure if my code is correct or not. I'm using Amazon SDK for java and Guava to deal with the file stream.

Please help


        S3Object object = s3Client.getObject(new GetObjectRequest(bucketName, folder + filename));
        final InputStream objectData = object.getObjectContent();

        InputSupplier supplier = CharStreams.newReaderSupplier(new InputSupplier() {
            @Override
            public InputStream getInput() throws IOException {
                return objectData;
            }
        }, Charsets.UTF_8);

        String content = CharStreams.toString(supplier);
        objectData.close();

        return content;

I use this option for my JVM. -Xms512m -Xmx2g. I use ant to run the main program so I include the jvm option to ANT_OPTS as well. But it's still not working.

556

asked Apr 18 '13 17:04

toy

2 Answers

The point of InputSupplier -- though you should be using ByteSource and CharSource these days -- is that you should never have access to the InputStream from the outside, so you don't have to remember to close it or not.

If you're using an old version of Guava before ByteSource and CharSource were introduced, then this should be

    InputSupplier supplier = CharStreams.newReaderSupplier(new InputSupplier() {
        @Override
        public InputStream getInput() throws IOException {
           S3Object object = s3Client.getObject(
             new GetObjectRequest(bucketName, folder + filename));
           return object.getObjectContent();
        }
    }, Charsets.UTF_8);
    String content = CharStreams.toString(supplier);

If you're using Guava 14, then this can be done more fluently as

    new ByteSource() {
      @Override public InputStream openStream() throws IOException {
        S3Object object = s3Client.getObject(
            new GetObjectRequest(bucketName, folder + filename));
        return object.getObjectContent();
      }
    }.asCharSource(Charsets.UTF_8).read();

That said: your file might be 400MB, but Java Strings are stored as UTF-16, which can easily double its memory consumption. You may either need lots more memory, or you need to figure out a way to avoid keeping the whole file in memory at once.

142

answered Oct 31 '22 03:10

Louis Wasserman

Rather than taking whole file in memory you can read file by parts so your whole file will not been in memory . Avoid taking whole file in memory so that you wont get memory issue because of limited memory

GetObjectRequest rangeObjectRequest = new GetObjectRequest(bucketName, key);
rangeObjectRequest.setRange(0, 1000); // retrieve 1st 1000 bytes.
S3Object objectPortion = s3Client.getObject(rangeObjectRequest);
InputStream objectData = objectPortion.getObjectContent();

//Go in loop now and make file locally by reading content from s3 and append file in loop so there wont be whole content in memory

answered Oct 31 '22 03:10

pravinbhogil

Related questions
                            
                                Java Drag-n-Drop files of specific extension on JFrame
                            
                                Lookup Tables in Java?
                            
                                Why does java decompiler does not save annotations [closed]
                            
                                Testing Spring controllers using Powermock
                            
                                How to fix exception: Failed to instantiate SLF4J LoggerFactory?
                            
                                Palindrome Checker using StringBuilder
                            
                                NameValuePair sort
                            
                                Maven - package org.junit does not exist even though dependency added
                            
                                ServerSocket + client Socket - how do I get IP address of client?
                            
                                Unparseable date using DateFormat.parse()
                            
                                Is it possible to recover message from MD5 and Java? [closed]
                            
                                Get a component from a JTextPane through javax.swing.text.Element?
                            
                                Using MediaWiki to pull text from a Wikia page but it comes back in a big mess is there a better way I could do this to pull text from each section?
                            
                                How to get rows and columns if arrays is only 1D using for loops
                            
                                Finding JDK path and storing it as a string in Java
                            
                                Array null pointer exception error
                            
                                <? extends > Java syntax
                            
                                Two-Dimensional ArrayList set an element
                            
                                Is it possible to use eclipse on Windows without a JDK? [closed]
                            
                                How does JRE know the line number of code where exception occured?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to read large file from Amazon S3?

Tags:

java

amazon-web-services

amazon-s3

guava

toy

People also ask

2 Answers

Louis Wasserman

pravinbhogil

Recent Activity

Donate For Us