Reading the data written to s3 by Amazon Kinesis Firehose stream

Tags:

I am writing record to Kinesis Firehose stream that is eventually written to a S3 file by Amazon Kinesis Firehose.

My record object looks like

ItemPurchase {     String personId,     String itemId }

The data is written to S3 looks like:

{"personId":"p-111","itemId":"i-111"}{"personId":"p-222","itemId":"i-222"}{"personId":"p-333","itemId":"i-333"}

NO COMMA SEPERATION.

NO STARTING BRACKET as in a Json Array

NO ENDING BRACKET as in a Json Array

I want to read this data get a list of ItemPurchase objects.

List<ItemPurchase> purchases = getPurchasesFromS3(IOUtils.toString(s3ObjectContent))

What is the correct way to read this data?

532

asked Dec 26 '15 03:12

learner_21

1 Answers

It boggles my mind that Amazon Firehose dumps JSON messages to S3 in this manner, and doesn't allow you to set a delimiter or anything.

Ultimately, the trick I found to deal with the problem was to process the text file using the JSON raw_decode method

This will allow you to read a bunch of concatenated JSON records without any delimiters between them.

Python code:

import json  decoder = json.JSONDecoder()  with open('giant_kinesis_s3_text_file_with_concatenated_json_blobs.txt', 'r') as content_file:      content = content_file.read()      content_length = len(content)     decode_index = 0      while decode_index < content_length:         try:             obj, decode_index = decoder.raw_decode(content, decode_index)             print("File index:", decode_index)             print(obj)         except JSONDecodeError as e:             print("JSONDecodeError:", e)             # Scan forward and keep trying to decode             decode_index += 1

answered Sep 27 '22 16:09

Tom Chapin

Related questions
                            
                                Returning unescaped Json in MVC with Json.Net
                            
                                How can I get the index from a JSON object with value?
                            
                                DRF testing: instead of JSON an OrderedDict is returned
                            
                                Using JSON to Serialize/Deserialize TimeSpan
                            
                                Go- Copy all common fields between structs
                            
                                Autorefreshing/updating table using jQuery ajax by either using json or html files
                            
                                Peer-to-Peer communication options
                            
                                A list of tuples in Javascript [closed]
                            
                                Setting Default value to a variable when deserializing using gson
                            
                                Add a new key-value to a json file using Ansible
                            
                                Normal form submission vs. JSON
                            
                                Java.util.Map to JSON Object with Jersey / JAXB / Jackson
                            
                                Decoding a JSON without keys in Swift 4
                            
                                how to implement nested item in scrapy?
                            
                                Spring @RequestBody containing a list of different types (but same interface)
                            
                                Convert form data to JSON object [duplicate]
                            
                                Python json.loads ValueError, expecting delimiter
                            
                                Unexpected ConvertTo-Json results? Answer: it has a default -Depth of 2
                            
                                What is XML good for and when should i be using it?
                            
                                What's the best JSON or JS object to XML converter module for Node JS [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Reading the data written to s3 by Amazon Kinesis Firehose stream

Tags:

json

amazon-s3

amazon-kinesis

amazon-kinesis-firehose

learner_21

People also ask

1 Answers

Tom Chapin

Recent Activity

Donate For Us