I have a program which will read a textfile from Amazon s3, but the file is around 400M. I have increased my Heap size but i'm still getting the Java Heap Size error. So, I'm not sure if my code is correct or not. I'm using Amazon SDK for java and Guava to deal with the file stream.
Please help
S3Object object = s3Client.getObject(new GetObjectRequest(bucketName, folder + filename));
final InputStream objectData = object.getObjectContent();
InputSupplier supplier = CharStreams.newReaderSupplier(new InputSupplier() {
@Override
public InputStream getInput() throws IOException {
return objectData;
}
}, Charsets.UTF_8);
String content = CharStreams.toString(supplier);
objectData.close();
return content;
I use this option for my JVM. -Xms512m -Xmx2g
. I use ant to run the main program so I include the jvm option to ANT_OPTS as well. But it's still not working.
Individual Amazon S3 objects can range in size from a minimum of 0 bytes to a maximum of 5 TB. The largest object that can be uploaded in a single PUT is 5 GB.
Reading objects without downloading them Similarly, if you want to upload and read small pieces of textual data such as quotes, tweets, or news articles, you can do that using the S3 resource method put(), as demonstrated in the example below (Gist).
In the Amazon S3 console, choose your S3 bucket, choose the file that you want to open or download, choose Actions, and then choose Open or Download. If you are downloading an object, specify where you want to save it. The procedure for saving the object depends on the browser and operating system that you are using.
You can upload any file type—images, backups, data, movies, etc. —into an S3 bucket. The maximum size of a file that you can upload by using the Amazon S3 console is 160 GB. To upload a file larger than 160 GB, use the AWS CLI, AWS SDK, or Amazon S3 REST API.
The point of InputSupplier
-- though you should be using ByteSource
and CharSource
these days -- is that you should never have access to the InputStream
from the outside, so you don't have to remember to close it or not.
If you're using an old version of Guava before ByteSource
and CharSource
were introduced, then this should be
InputSupplier supplier = CharStreams.newReaderSupplier(new InputSupplier() {
@Override
public InputStream getInput() throws IOException {
S3Object object = s3Client.getObject(
new GetObjectRequest(bucketName, folder + filename));
return object.getObjectContent();
}
}, Charsets.UTF_8);
String content = CharStreams.toString(supplier);
If you're using Guava 14, then this can be done more fluently as
new ByteSource() {
@Override public InputStream openStream() throws IOException {
S3Object object = s3Client.getObject(
new GetObjectRequest(bucketName, folder + filename));
return object.getObjectContent();
}
}.asCharSource(Charsets.UTF_8).read();
That said: your file might be 400MB, but Java String
s are stored as UTF-16, which can easily double its memory consumption. You may either need lots more memory, or you need to figure out a way to avoid keeping the whole file in memory at once.
Rather than taking whole file in memory you can read file by parts so your whole file will not been in memory . Avoid taking whole file in memory so that you wont get memory issue because of limited memory
GetObjectRequest rangeObjectRequest = new GetObjectRequest(bucketName, key);
rangeObjectRequest.setRange(0, 1000); // retrieve 1st 1000 bytes.
S3Object objectPortion = s3Client.getObject(rangeObjectRequest);
InputStream objectData = objectPortion.getObjectContent();
//Go in loop now and make file locally by reading content from s3 and append file in loop so there wont be whole content in memory
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With