Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark SQL change format of the number

After show command spark prints the following:

+-----------------------+---------------------------+
|NameColumn             |NumberColumn               |
+-----------------------+---------------------------+
|name                   |4.3E-5                     |
+-----------------------+---------------------------+

Is there a way to change NumberColumn format to something like 0.000043?

like image 481
Cherry Avatar asked Jul 10 '17 08:07

Cherry


People also ask

How do I change the data type in Spark?

To change the Spark SQL DataFrame column type from one data type to another data type you should use cast() function of Column class, you can use this on withColumn(), select(), selectExpr(), and SQL expression.

How do I change the date format in Spark SQL?

Spark to_date() – Convert String to Date format to_date() – function is used to format string ( StringType ) to date ( DateType ) column. Below code, snippet takes the date in a string and converts it to date format on DataFrame.

How do I replace a character in Spark SQL?

By using regexp_replace() Spark function you can replace a column's string value with another string/substring. regexp_replace() uses Java regex for matching, if the regex does not match it returns an empty string.

Is numeric function in Spark SQL?

Spark SQL, or Apache Hive does not provide support for is numeric function. You have to write a user defined function using your favorite programming language and register it in Spark or use alternative SQL option to check numeric values.


2 Answers

you can use format_number function as

import org.apache.spark.sql.functions.format_number
df.withColumn("NumberColumn", format_number($"NumberColumn", 5))

here 5 is the decimal places you want to show

As you can see in the link above that the format_number functions returns a string column

format_number(Column x, int d)
Formats numeric column x to a format like '#,###,###.##', rounded to d decimal places, and returns the result as a string column.

If your don't require , you can call regexp_replace function which is defined as

regexp_replace(Column e, String pattern, String replacement)
Replace all substrings of the specified string value that match regexp with rep.

and use it as

import org.apache.spark.sql.functions.regexp_replace
df.withColumn("NumberColumn", regexp_replace(format_number($"NumberColumn", 5), ",", ""))

Thus comma (,) should be removed for large numbers.

like image 89
Ramesh Maharjan Avatar answered Sep 19 '22 08:09

Ramesh Maharjan


You can use cast operation as below:

val df = sc.parallelize(Seq(0.000043)).toDF("num")    

df.createOrReplaceTempView("data")
spark.sql("select CAST (num as DECIMAL(8,6)) from data")

adjust the precision and scale accordingly.

like image 40
vdep Avatar answered Sep 19 '22 08:09

vdep