Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to download GZip file from S3?

I have looked at both AWS S3 Java SDK - Download file help and Working with Zip and GZip files in Java.

While they provide ways to download and deal with files from S3 and GZipped files respectively, these do not help in dealing with a GZipped file located in S3. How would I do this?

Currently I have:

try {
    AmazonS3 s3Client = new AmazonS3Client(
            new ProfileCredentialsProvider());
    String URL = downloadURL.getPrimitiveJavaObject(arg0[0].get());
    S3Object fileObj = s3Client.getObject(getBucket(URL), getFile(URL));
    BufferedReader fileIn = new BufferedReader(new InputStreamReader(
            fileObj.getObjectContent()));
    String fileContent = "";
    String line = fileIn.readLine();
    while (line != null){
        fileContent += line + "\n";
        line = fileIn.readLine();
    }
    fileObj.close();
    return fileContent;
} catch (IOException e) {
    e.printStackTrace();
    return "ERROR IOEXCEPTION";
}

Clearly, I am not handling the compressed nature of the file, and my output is:

����sU�3204�50�5010�20�24��L,(���O�V�M-.NLOU�R�U�����<s��<#�^�.wߐX�%w���������}C=�%�J3��.�����둚�S�ᜑ���ZQ�T�e��#sr�cdN#瘐:&�
S�BǔJ����P�<��

However, I cannot implement the example in the second question given above because the file is not located locally, it requires downloading from S3.

What should I do?

like image 612
ylun.ca Avatar asked Jul 01 '15 17:07

ylun.ca


3 Answers

I solved the issue using a Scanner instead of an InputStream.

The scanner takes the GZIPInputStream and reads the unzipped file line by line:

fileObj = s3Client.getObject(new GetObjectRequest(oSummary.getBucketName(), oSummary.getKey()));
fileIn = new Scanner(new GZIPInputStream(fileObj.getObjectContent()));
like image 168
ylun.ca Avatar answered Sep 19 '22 11:09

ylun.ca


You have to use GZIPInputStream to read GZIP file

       AmazonS3 s3Client = AmazonS3ClientBuilder.standard()
            .withCredentials(new ProfileCredentialsProvider())
            .build();
    String URL = downloadURL.getPrimitiveJavaObject(arg0[0].get());
    S3Object fileObj = s3Client.getObject(getBucket(URL), getFile(URL));

    byte[] buffer = new byte[1024];
    int n;
    FileOutputStream fileOuputStream = new FileOutputStream("temp.gz");
    BufferedInputStream bufferedInputStream = new BufferedInputStream( new GZIPInputStream(fileObj.getObjectContent()));

    GZIPOutputStream gzipOutputStream = new GZIPOutputStream(fileOuputStream);
    while ((n = bufferedInputStream.read(buffer)) != -1) {
        gzipOutputStream.write(buffer);
    }
    gzipOutputStream.flush();
    gzipOutputStream.close();

Please try this way to download GZip file from S3.

like image 34
Ahmad Al-Kurdi Avatar answered Sep 20 '22 11:09

Ahmad Al-Kurdi


Try this

    BasicAWSCredentials creds = new BasicAWSCredentials("accessKey", "secretKey");
    AmazonS3 s3 = AmazonS3ClientBuilder.standard().withCredentials(new AWSStaticCredentialsProvider(creds))
            .withRegion(Regions).build();
    String bucketName = "bucketName";
    String keyName = "keyName";
    S3Object fileObj = s3.getObject(new GetObjectRequest(bucketName, keyName));
    Scanner fileIn = new Scanner(new GZIPInputStream(fileObj.getObjectContent()));
    if (null != fileIn) {
        while (fileIn.hasNext()) {
            System.out.println("Line: " + fileIn.nextLine());
        }
    }
}
like image 36
Sargurunathan Balasubramanian Avatar answered Sep 19 '22 11:09

Sargurunathan Balasubramanian