Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SerDe properties list for AWS Athena (JSON)

I'm testing the Athena product of AWS, so far is working very good. But I want to know the list of SerDe properties. I've searched far and wide and couldn't find it. I'm using this one for example "ignore.malformed.json" = "true", but I'm pretty sure there are a ton of other options to tune the queries.

I couldn't find info for example, on what the "path" property does, so having the full list will be amazing.

I have looked at Apache Hive docs but couldn't find this, and neither on AWS docs/forums.

Thanks!

like image 518
Laerion Avatar asked May 22 '17 17:05

Laerion


People also ask

How do you write custom SerDe in AWS Athena?

You specify a SerDe type by listing it explicitly in the ROW FORMAT part of your CREATE TABLE statement in Athena. In some cases, you can omit the SerDe name because Athena uses some SerDe types by default for certain types of data formats.

Can Athena read JSON files?

Amazon Athena lets you parse JSON-encoded values, extract data from JSON, search for values, and find length and size of JSON arrays.

What is a JSON SerDe?

The Hive JSON SerDe is commonly used to process JSON data like events. These events are represented as single-line strings of JSON-encoded text separated by a new line. The Hive JSON SerDe does not allow duplicate keys in map or struct key names.

What is SerDe property?

A SerDe (Serializer/Deserializer) is a way in which Athena interacts with data in various formats. It is the SerDe you specify, and not the DDL, that defines the table schema. In other words, the SerDe can override the DDL configuration that you specify in Athena when you create your table.


2 Answers

It seems you are using the Openx-JsonSerDe
http://docs.aws.amazon.com/athena/latest/ug/json.html

// properties used in configuration
public static final String PROP_IGNORE_MALFORMED_JSON = "ignore.malformed.json";
public static final String PROP_DOTS_IN_KEYS = "dots.in.keys";
public static final String PROP_CASE_INSENSITIVE ="case.insensitive" ;

https://github.com/rcongiu/Hive-JSON-Serde/blob/master/json-serde/src/main/java/org/openx/data/jsonserde/JsonSerDe.java

like image 116
David דודו Markovitz Avatar answered Sep 26 '22 07:09

David דודו Markovitz


As stated in release notes (see bullet #2 please), the JSON OpenX SerDe used in Athena has been improved. The improvements include, but are not limited to, the following:

  • Support for the ConvertDotsInJsonKeysToUnderscores property. When set to TRUE, it allows the SerDe to replace the dots in key names with underscores. For example, if the JSON dataset contains a key with the name "a.b", you can use this property to define the column name to be "a_b" in Athena. The default is FALSE. By default, Athena does not allow dots in column names.
  • Support for the case.insensitive property. By default, Athena requires that all keys in your JSON dataset use lowercase. Using WITH SERDE PROPERTIES ("case.insensitive"= FALSE;) allows you to use case-sensitive key names in your data. The default is TRUE. When set to TRUE, the SerDe converts all uppercase columns to lowercase.

For more information, see OpenX JSON SerDe in the Amazon Athena User Guide.

like image 36
user10659002 Avatar answered Sep 23 '22 07:09

user10659002