I have a dataframe in Spark using scala that has a column that I need split.
scala> test.show +-------------+ |columnToSplit| +-------------+ | a.b.c| | d.e.f| +-------------+
I need this column split out to look like this:
+--------------+ |col1|col2|col3| | a| b| c| | d| e| f| +--------------+
I'm using Spark 2.0.0
Thanks
pyspark. sql. functions provide a function split() which is used to split DataFrame string Column into multiple columns.
Spark split() function to convert string to Array column. Spark SQL provides split() function to convert delimiter separated String to array (StringType to ArrayType) column on Dataframe. This can be done by splitting a string column based on a delimiter like space, comma, pipe e.t.c, and converting into ArrayType.
You can add multiple columns to Spark DataFrame in several ways if you wanted to add a known set of columns you can easily do by chaining withColumn() or on select(). However, sometimes you may need to add multiple columns after applying some transformations n that case you can use either map() or foldLeft().
Try:
import sparkObject.spark.implicits._ import org.apache.spark.sql.functions.split df.withColumn("_tmp", split($"columnToSplit", "\\.")).select( $"_tmp".getItem(0).as("col1"), $"_tmp".getItem(1).as("col2"), $"_tmp".getItem(2).as("col3") )
The important point to note here is that the sparkObject
is the SparkSession object you might have already initialized. So, the (1) import statement has to be compulsorily put inline within the code, not before the class definition.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With