Using Spark 2.1 (on Databricks), I have a table which has a column of type String as a result of an import from a .CSV file. In a SELECT query against that table, I am attempting to convert that column's value into an Integer before using the column value in a mathematical operation. I have been unable to find the right Spark SQL "function" to do this.
Below is an example of the SQL. "TO_NUMBER" isn't working on either of the two Strings; Sum_GN_POP or Count1:
SELECT name AS geohashPrefix3, TO_NUMBER(Sum_GN_POP) AS totalPopulation, TO_NUMBER(Count1) AS landMass
FROM wayne_geohash3
WHERE (LENGTH(name) = 3)
And it would be helpful if I could find the documentation for this. I will want to do other kinds of conversions (or casts) with other types, too. Any guidance on either or both of this is greatly appreciated.
We can convert String to an int in java using Integer.parseInt() method. To convert String into Integer, we can use Integer.valueOf() method which returns instance of Integer class.
You convert a string to a number by calling the Parse or TryParse method found on numeric types ( int , long , double , and so on), or by using methods in the System. Convert class. It's slightly more efficient and straightforward to call a TryParse method (for example, int.
One effective way to convert a string object into a numeral int is to use the stoi() function. This method is commonly used for newer versions of C++, with is being introduced with C++11. It takes as input a string value and returns as output the integer version of it.
You can get it as Integer
from the csv
file using the option inferSchema like this :
val df = spark.read.option("inferSchema", true).csv("file-location")
That being said : the inferSchema option do make mistakes sometimes and put the type as String
. if so you can use the cast
operator on Column
Dataframe/Dataset Implemetation :
val df2 = df.withColumn("Count1", $"Count1" cast "Int" as "landMass").withColumn("Count1", $"Sum_GN_POP" cast "Int" as "totalPopulation")
SQL Implemetation :
SELECT name AS geohashPrefix3, CAST(Sum_GN_POP as INT) AS totalPopulation, CAST(Count1 AS INT) AS landMass
FROM wayne_geohash3
WHERE (LENGTH(name) = 3)
Summary:
Apache Spark's SQL has partial compatibility with Apache Hive. So, most SQL that can be written in Hive can be written in Spark SQL.
Detail:
To convert a STRING to a specific numeric type like INT, a cast may be used. The cast consists of wrapping the target with parenthesis and preceding the parenthesis with the type to which it is to be changed. For example, the cast might look like this:
INT(someStringValue)
So, to make the SQL in the original posted question work, it needs to be changed to look like this (replacing the original function named "TO_NUMBER" with "INT"):
SELECT name AS geohashPrefix3, INT(Sum_GN_POP) AS totalPopulation, INT(Count1) AS landMass
FROM wayne_geohash3
WHERE (LENGTH(name) = 3)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With