After show
command spark prints the following:
+-----------------------+---------------------------+
|NameColumn |NumberColumn |
+-----------------------+---------------------------+
|name |4.3E-5 |
+-----------------------+---------------------------+
Is there a way to change NumberColumn
format to something like 0.000043
?
To change the Spark SQL DataFrame column type from one data type to another data type you should use cast() function of Column class, you can use this on withColumn(), select(), selectExpr(), and SQL expression.
Spark to_date() – Convert String to Date format to_date() – function is used to format string ( StringType ) to date ( DateType ) column. Below code, snippet takes the date in a string and converts it to date format on DataFrame.
By using regexp_replace() Spark function you can replace a column's string value with another string/substring. regexp_replace() uses Java regex for matching, if the regex does not match it returns an empty string.
Spark SQL, or Apache Hive does not provide support for is numeric function. You have to write a user defined function using your favorite programming language and register it in Spark or use alternative SQL option to check numeric values.
you can use format_number
function as
import org.apache.spark.sql.functions.format_number
df.withColumn("NumberColumn", format_number($"NumberColumn", 5))
here 5 is the decimal places you want to show
As you can see in the link above that the format_number
functions returns a string column
format_number(Column x, int d)
Formats numeric column x to a format like '#,###,###.##', rounded to d decimal places, and returns the result as a string column.
If your don't require ,
you can call regexp_replace
function which is defined as
regexp_replace(Column e, String pattern, String replacement)
Replace all substrings of the specified string value that match regexp with rep.
and use it as
import org.apache.spark.sql.functions.regexp_replace
df.withColumn("NumberColumn", regexp_replace(format_number($"NumberColumn", 5), ",", ""))
Thus comma (,
) should be removed for large numbers.
You can use cast
operation as below:
val df = sc.parallelize(Seq(0.000043)).toDF("num")
df.createOrReplaceTempView("data")
spark.sql("select CAST (num as DECIMAL(8,6)) from data")
adjust the precision and scale accordingly.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With