create substring column in spark dataframe

Question

I want to take a json file and map it so that one of the columns is a substring of another. For example to take the left table and produce the right table:

 ------------              ------------------------
|     a      |             |      a     |    b    |
|------------|       ->    |------------|---------|
|hello, world|             |hello, world|  hello  |

I can do this using spark-sql syntax but how can it be done using the in-built functions?

pasha701 · Accepted Answer

Such statement can be used

import org.apache.spark.sql.functions._

dataFrame.select(col("a"), substring_index(col("a"), ",", 1).as("b"))

Balázs Fehér · Answer

Suppose you have the following dataframe:

import spark.implicits._
import org.apache.spark.sql.functions._

var df = sc.parallelize(Seq(("foobar", "foo"))).toDF("a", "b")

+------+---+
|     a|  b|
+------+---+
|foobar|foo|
+------+---+

You could subset a new column from the first column as follows:

df = df.select(col("*"), substring(col("a"), 4, 6).as("c"))

+------+---+---+
|     a|  b|  c|
+------+---+---+
|foobar|foo|bar|
+------+---+---+

create substring column in spark dataframe

Tags:

scala

apache-spark

spark-dataframe

J Smith

2 Answers

pasha701

Balázs Fehér

Recent Activity

Donate For Us

create substring column in spark dataframe

Tags:

scala

apache-spark

spark-dataframe

J Smith

2 Answers

pasha701

Balázs Fehér

Related questions

Recent Activity

Donate For Us