Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Amazon Athena- Querying columns with numbers stored as string

I have a insurance dataset which includes the number of enrollment for each county. However the number of enrollments is stored as a string. How can i query the data for something like "Find the plans which have a enrollment of more than 50". Unfortunately 50 is stored as a string in the dataset so i need to understand how to run my query using athena. Can someone help please

enter image description here

like image 406
Raj Parpani Avatar asked Nov 28 '19 00:11

Raj Parpani


People also ask

Why do I get zero records when I query my Amazon Athena table?

Incorrect LOCATION path If the input LOCATION path is incorrect, then Athena returns zero records.

What data format does Amazon Athena support?

Q: What data formats does Amazon Athena support? Amazon Athena supports a wide variety of data formats like CSV, TSV, JSON, or Textfiles and also supports open source columnar formats such as Apache ORC and Apache Parquet. Athena also supports compressed data in Snappy, Zlib, LZO, and GZIP formats.

Can Athena query text files?

But unlike Apache Drill, Athena is limited to data only from Amazon's own S3 storage service. However, Athena is able to query a variety of file formats, including, but not limited to CSV, Parquet, JSON, etc.

How do I store Athena query results?

Another way of storing Athena query results at a specific location in S3 is to use a CTAS-Query (CREATE TABLE AS SELECT). Using this has tons of advantages, because you can even specify the result format. Gzipped JSON, Parquet etc... CREATE TABLE default.


1 Answers

Cast string to floating point number, not integer, and remove commas before cast. Here is an example:

with x AS 
    (SELECT '1,800,850.20' AS "value")
SELECT cast(replace(value,',', '') AS REAL)
FROM x

Therefore, you should use:

SELECT
  npi,
  CAST(REPLACE(total_submitted_charge_amount,',', '') AS REAL) AS charge_amount
FROM cmsaggregatepayment2017
WHERE CAST(REPLACE(total_submitted_charge_amount,',', '') > 100000
ORDER BY CAST(REPLACE(total_submitted_charge_amount,',', '') ASC
LIMIT 1000
like image 56
shuvalov Avatar answered Sep 30 '22 21:09

shuvalov