Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read large file from Amazon S3?

I have a program which will read a textfile from Amazon s3, but the file is around 400M. I have increased my Heap size but i'm still getting the Java Heap Size error. So, I'm not sure if my code is correct or not. I'm using Amazon SDK for java and Guava to deal with the file stream.

Please help


        S3Object object = s3Client.getObject(new GetObjectRequest(bucketName, folder + filename));
        final InputStream objectData = object.getObjectContent();

        InputSupplier supplier = CharStreams.newReaderSupplier(new InputSupplier() {
            @Override
            public InputStream getInput() throws IOException {
                return objectData;
            }
        }, Charsets.UTF_8);

        String content = CharStreams.toString(supplier);
        objectData.close();

        return content;

I use this option for my JVM. -Xms512m -Xmx2g. I use ant to run the main program so I include the jvm option to ANT_OPTS as well. But it's still not working.

like image 556
toy Avatar asked Apr 18 '13 17:04

toy


People also ask

What is the largest size file you can transfer to S3?

Individual Amazon S3 objects can range in size from a minimum of 0 bytes to a maximum of 5 TB. The largest object that can be uploaded in a single PUT is 5 GB.

Can we read file from S3 without downloading?

Reading objects without downloading them Similarly, if you want to upload and read small pieces of textual data such as quotes, tweets, or news articles, you can do that using the S3 resource method put(), as demonstrated in the example below (Gist).

How do I read data on Amazon S3?

In the Amazon S3 console, choose your S3 bucket, choose the file that you want to open or download, choose Actions, and then choose Open or Download. If you are downloading an object, specify where you want to save it. The procedure for saving the object depends on the browser and operating system that you are using.

Can we upload 6tb file to S3?

You can upload any file type—images, backups, data, movies, etc. —into an S3 bucket. The maximum size of a file that you can upload by using the Amazon S3 console is 160 GB. To upload a file larger than 160 GB, use the AWS CLI, AWS SDK, or Amazon S3 REST API.


2 Answers

The point of InputSupplier -- though you should be using ByteSource and CharSource these days -- is that you should never have access to the InputStream from the outside, so you don't have to remember to close it or not.

If you're using an old version of Guava before ByteSource and CharSource were introduced, then this should be

    InputSupplier supplier = CharStreams.newReaderSupplier(new InputSupplier() {
        @Override
        public InputStream getInput() throws IOException {
           S3Object object = s3Client.getObject(
             new GetObjectRequest(bucketName, folder + filename));
           return object.getObjectContent();
        }
    }, Charsets.UTF_8);
    String content = CharStreams.toString(supplier);

If you're using Guava 14, then this can be done more fluently as

    new ByteSource() {
      @Override public InputStream openStream() throws IOException {
        S3Object object = s3Client.getObject(
            new GetObjectRequest(bucketName, folder + filename));
        return object.getObjectContent();
      }
    }.asCharSource(Charsets.UTF_8).read();

That said: your file might be 400MB, but Java Strings are stored as UTF-16, which can easily double its memory consumption. You may either need lots more memory, or you need to figure out a way to avoid keeping the whole file in memory at once.

like image 142
Louis Wasserman Avatar answered Oct 31 '22 03:10

Louis Wasserman


Rather than taking whole file in memory you can read file by parts so your whole file will not been in memory . Avoid taking whole file in memory so that you wont get memory issue because of limited memory

GetObjectRequest rangeObjectRequest = new GetObjectRequest(bucketName, key);
rangeObjectRequest.setRange(0, 1000); // retrieve 1st 1000 bytes.
S3Object objectPortion = s3Client.getObject(rangeObjectRequest);
InputStream objectData = objectPortion.getObjectContent();

//Go in loop now and make file locally by reading content from s3 and append file in loop so there wont be whole content in memory

like image 34
pravinbhogil Avatar answered Oct 31 '22 03:10

pravinbhogil