I have created a table in amazon athena. I have my data as CSV file(one large file having data from 2010-2015).It is not partitioned in s3. I am trying to create partitions in athena but partitions are not pulling any results.
step 1: creating table in athena
CREATE EXTERNAL TABLE IF NOT EXISTS DATABASE.table1 (
NULL string,
OrderID string,
CustomerID string,
EmailAddress string
)
PARTITIONED BY (OrderDate STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
'serialization.format' = ',',
'quoteChar' = '"',
'field.delim' = ','
) LOCATION 's3://XYZ/Orders/'
TBLPROPERTIES ('has_encrypted_data'='false');
After this STEP 2: Loading partitions manually (my orderdate format is mm/dd/yy hrs min)
ALTER TABLE table1 ADD PARTITION (orderdate='01/01/2010 00:00') location 's3://xyz/Orders/'
Its says partition has been created successfully but when i query against this i dont get any results.So am I creating partitions in a wrong way? why it is not returning results?Any help is appreciated.
After trying out the comments below i am still not able to retrieve any results when i query data using order date.I uploaded my files again to athena(two separate files for two years) where the path of files looks like this
s3://xyz/Orders/year/orders+2010.csv
s3://xyz/Orders/year/orders+2014.csv
After doing this I am still not able to the partitions and query data using date range.
Partitions are sub-directories. When correctly defined Athena can skip over sub-directories that do not contain relevant data.
For example, if a table is partitioned by date (YYYY-MM-DD), there would be a directory called date=2017-11-08
that contains only data for that day.
Given that all your data is contained in a single CSV, you cannot use partitions. If you were to divide the file into individual daily files within an appropriate directory structure, Athena would run faster for you and would be cheaper to operate.
Athena cannot generate the partitions for you -- you must provide Athena with data that is already partitioned. The ADD PARTITION
and MSCK REPAIR TABLE
commands merely scan in and use the existing partitions.
See AWS Big Data blog: Analyzing Data in S3 using Amazon Athena
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With