I have a insurance dataset which includes the number of enrollment for each county. However the number of enrollments is stored as a string. How can i query the data for something like "Find the plans which have a enrollment of more than 50". Unfortunately 50 is stored as a string in the dataset so i need to understand how to run my query using athena. Can someone help please
Incorrect LOCATION path If the input LOCATION path is incorrect, then Athena returns zero records.
Q: What data formats does Amazon Athena support? Amazon Athena supports a wide variety of data formats like CSV, TSV, JSON, or Textfiles and also supports open source columnar formats such as Apache ORC and Apache Parquet. Athena also supports compressed data in Snappy, Zlib, LZO, and GZIP formats.
But unlike Apache Drill, Athena is limited to data only from Amazon's own S3 storage service. However, Athena is able to query a variety of file formats, including, but not limited to CSV, Parquet, JSON, etc.
Another way of storing Athena query results at a specific location in S3 is to use a CTAS-Query (CREATE TABLE AS SELECT). Using this has tons of advantages, because you can even specify the result format. Gzipped JSON, Parquet etc... CREATE TABLE default.
Cast string to floating point number, not integer, and remove commas before cast. Here is an example:
with x AS
(SELECT '1,800,850.20' AS "value")
SELECT cast(replace(value,',', '') AS REAL)
FROM x
Therefore, you should use:
SELECT
npi,
CAST(REPLACE(total_submitted_charge_amount,',', '') AS REAL) AS charge_amount
FROM cmsaggregatepayment2017
WHERE CAST(REPLACE(total_submitted_charge_amount,',', '') > 100000
ORDER BY CAST(REPLACE(total_submitted_charge_amount,',', '') ASC
LIMIT 1000
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With