I am using Firehose and Glue to ingest data and convert JSON to the parquet file in S3.
I was successful to achieve it with normal JSON (not nested or array). But I am failed for a nested JSON array. What I have done:
the JSON structure
{
"class_id": "test0001",
"students": [{
"student_id": "xxxx",
"student_name": "AAAABBBCCC",
"student_gpa": 123
}]
}
the Glue schema
ARRAY<STRUCT<student_id:STRING,student_name:STRING,student_gpa:INT>>
I receive error:
The schema is invalid. Error parsing the schema: Error: type expected at the position 0 of 'ARRAY<STRUCT<student_id:STRING,student_name:STRING,student_gpa:INT>>' but 'ARRAY' is found.
Any suggestion is appreciated.
I ran into that because I created schemas manually in the AWS console. The problem is, that it shows some help text next to form to enter your nested data which capitalizes everything, but Parquet can only work with lowercase definitions.
Write despite the example given by AWS:
array<struct<student_id:string,student_name:string,student_gpa:int>>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With