Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I convert (or cast) a String value to an Integer value?

Using Spark 2.1 (on Databricks), I have a table which has a column of type String as a result of an import from a .CSV file. In a SELECT query against that table, I am attempting to convert that column's value into an Integer before using the column value in a mathematical operation. I have been unable to find the right Spark SQL "function" to do this.

Below is an example of the SQL. "TO_NUMBER" isn't working on either of the two Strings; Sum_GN_POP or Count1:

SELECT name AS geohashPrefix3, TO_NUMBER(Sum_GN_POP) AS totalPopulation, TO_NUMBER(Count1) AS landMass
    FROM wayne_geohash3
   WHERE (LENGTH(name) = 3)

And it would be helpful if I could find the documentation for this. I will want to do other kinds of conversions (or casts) with other types, too. Any guidance on either or both of this is greatly appreciated.

like image 305
chaotic3quilibrium Avatar asked Aug 26 '17 19:08

chaotic3quilibrium


People also ask

Can we convert string to integer?

We can convert String to an int in java using Integer.parseInt() method. To convert String into Integer, we can use Integer.valueOf() method which returns instance of Integer class.

How do I cast a string to a number?

You convert a string to a number by calling the Parse or TryParse method found on numeric types ( int , long , double , and so on), or by using methods in the System. Convert class. It's slightly more efficient and straightforward to call a TryParse method (for example, int.

Can you cast a string to an int in C++?

One effective way to convert a string object into a numeral int is to use the stoi() function. This method is commonly used for newer versions of C++, with is being introduced with C++11. It takes as input a string value and returns as output the integer version of it.


2 Answers

You can get it as Integer from the csv file using the option inferSchema like this :

val df = spark.read.option("inferSchema", true).csv("file-location")

That being said : the inferSchema option do make mistakes sometimes and put the type as String. if so you can use the cast operator on Column

Dataframe/Dataset Implemetation :

val df2 = df.withColumn("Count1", $"Count1" cast "Int" as "landMass").withColumn("Count1", $"Sum_GN_POP" cast "Int" as "totalPopulation")

SQL Implemetation :

SELECT name AS geohashPrefix3, CAST(Sum_GN_POP as INT) AS totalPopulation, CAST(Count1 AS INT) AS landMass
    FROM wayne_geohash3
   WHERE (LENGTH(name) = 3)
like image 128
Haroun Mohammedi Avatar answered Jan 04 '23 13:01

Haroun Mohammedi


Summary:
Apache Spark's SQL has partial compatibility with Apache Hive. So, most SQL that can be written in Hive can be written in Spark SQL.

Detail:
To convert a STRING to a specific numeric type like INT, a cast may be used. The cast consists of wrapping the target with parenthesis and preceding the parenthesis with the type to which it is to be changed. For example, the cast might look like this:

INT(someStringValue)

So, to make the SQL in the original posted question work, it needs to be changed to look like this (replacing the original function named "TO_NUMBER" with "INT"):

SELECT name AS geohashPrefix3, INT(Sum_GN_POP) AS totalPopulation, INT(Count1) AS landMass
  FROM wayne_geohash3
 WHERE (LENGTH(name) = 3)
like image 28
chaotic3quilibrium Avatar answered Jan 04 '23 12:01

chaotic3quilibrium