Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read JSON file present in S3 using java

I have a JSON file url present in S3 which I need to parse and extract information out of it. How do I do that in java?

I have looked into some of the solutions mainly in Python but not able to do that in Java.

I can read the content using

S3Object object = s3Client.getObject(new GetObjectRequest(bucketName, key));
InputStream objectData = object.getObjectContent();

but I do not want to download the file and keep it. I just need to be able to parse this JSON file using Gson.

How do I achieve this?

like image 305
roger_that Avatar asked Jan 12 '18 10:01

roger_that


People also ask

Can you store JSON in s3?

Unfortunately it seems s3 does not allow content-type application/json.... I should save my file as text/plain and then add header with php? While using content-type headers is certainly good, they are not required. If you know that a certain file contains JSON, you can just parse the response text with JSON.

What is JSON document in AWS?

JavaScript Object Notation, more commonly known by the acronym JSON, is an open data interchange format that is both human and machine-readable. Despite the name JavaScript Object Notation, JSON is independent of any programming language and is a common API output in a wide variety of applications.

How do I read a JSON file on my Galaxy s3?

To read JSON file from Amazon S3 and create a DataFrame, you can use either spark. read. json("path") or spark. read.


4 Answers

A bit late, but I'll leave this answer here in case someone else runs into this problem.

If you're not restricted to using Gson, then I'd recommend using Jackson's ObjectMapper instead.

Step 1: Add the Jackson dependency to your project.

// https://mvnrepository.com/artifact/com.fasterxml.jackson.core/jackson-databind
compile group: 'com.fasterxml.jackson.core', name: 'jackson-databind', version: '2.11.3'

Step 2: Create a Plain Old Java Object (POJO) that represents the JSON stream you want to parse. For example:

Class Item {
  
  public Item() { }

  private Integer id;
  private String name;
  ....
  // getters and setters

Step 3: Create an ObjectMapper instance and read the value from the JSON into an instance of your POJO class.

ObjectMapper objectMapper = new ObjectMapper();
S3Object s3Object = amazonS3.getObject(new GetObjectRequest(bucketName, key));
Item item = objectMapper.readValue(s3Object.getObjectContent(), Item.class);
like image 183
Naz Avatar answered Oct 23 '22 11:10

Naz


    AmazonS3 client = AmazonS3ClientBuilder.standard()
                       .withRegion(Regions.US_EAST_1.getName())
                       .build();
    Gson gson = new GsonBuilder().create();
    S3Object data = client.getObject("bucket_name", "file_path");
    try (S3ObjectInputStream s3is = data.getObjectContent()){
        File temporaryFile = new File("temporary_file.json");
        FileUtils.copyInputStreamToFile(s3is, temporaryFile);
        String jsonAsString = FileUtils.readFileToString(temporaryFile, UTF_8);
        YourClass obj = gson.fromJson(jsonAsString, YourClass.class);
    } catch (Exception e) {
            System.err.println(e.getMessage());
            System.exit(1);
   }

build.gradle

implementation group: 'com.amazonaws', name: 'aws-java-sdk-s3', version: '1.11.705'
implementation group: 'com.google.code.gson', name: 'gson', version: '2.8.6'
implementation group: 'commons-io', name: 'commons-io', version: '2.6'
like image 43
cedaniel200 Avatar answered Oct 23 '22 12:10

cedaniel200


(Just expanding the comments given above.)

Following the approach in S3ObjectWrapper, we can have a method like this:

private static String getAsString(InputStream is) throws IOException {
    if (is == null)
        return "";
    StringBuilder sb = new StringBuilder();
    try {
        BufferedReader reader = new BufferedReader(
                new InputStreamReader(is, StringUtils.UTF8));
        String line;
        while ((line = reader.readLine()) != null) {
            sb.append(line);
        }
    } finally {
        is.close();
    }
    return sb.toString();
}

Then call this method like:

S3Object o = s3.getObject(bucketName, key);
S3ObjectInputStream s3is = o.getObjectContent();
String str = getAsString(s3is);
like image 5
arun Avatar answered Oct 23 '22 12:10

arun


S3 is a blob store, it can't parse the file for you. If you want to parse the data AWS side you might be better off storing the file in DynamoDB, which understands json documents.

If that's not an option you are on the right lines. Just turn that input stream into a json file and then parse it in memory. There is no requirement to actually write the file to disk at any point. Unless its a huge file you should be able to do it in memory no problem.

like image 3
F_SO_K Avatar answered Oct 23 '22 12:10

F_SO_K