I want to take a json file and map it so that one of the columns is a substring of another. For example to take the left table and produce the right table:
------------ ------------------------
| a | | a | b |
|------------| -> |------------|---------|
|hello, world| |hello, world| hello |
I can do this using spark-sql syntax but how can it be done using the in-built functions?
Such statement can be used
import org.apache.spark.sql.functions._
dataFrame.select(col("a"), substring_index(col("a"), ",", 1).as("b"))
Suppose you have the following dataframe:
import spark.implicits._
import org.apache.spark.sql.functions._
var df = sc.parallelize(Seq(("foobar", "foo"))).toDF("a", "b")
+------+---+
| a| b|
+------+---+
|foobar|foo|
+------+---+
You could subset a new column from the first column as follows:
df = df.select(col("*"), substring(col("a"), 4, 6).as("c"))
+------+---+---+
| a| b| c|
+------+---+---+
|foobar|foo|bar|
+------+---+---+
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With