Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to concatenate a string to a column in Spark?

I have the following data and would like to get the result with a text prefix:

Input dataframe:

sk            id       
2306220722    117738

Current code:

df.withColumn("Remarks", concat_ws("MCA", col("ID")))

Expected output:

sk           id      Remarks  
2306220722   117738  MCA 117738

I would like to prefix the id column with "MCA" and add the resulting string to the Remarks column.

like image 270
Rjj Avatar asked Feb 07 '18 02:02

Rjj


People also ask

How do you add strings to a column in PySpark?

PySpark Concatenate Using concat() select() is a transformation function in PySpark and returns a new DataFrame with the selected columns. In the above example, using concat() function of Pyspark SQL, I have concatenated three input string columns(firstname, middlename, lastname) into a single string column(FullName).

How do I split a string into multiple columns in spark?

The PySpark SQL provides the split() function to convert delimiter separated String to an Array (StringType to ArrayType) column on DataFrame It can be done by splitting the string column on the delimiter like space, comma, pipe, etc.

What is Concat_ws in Scala?

The concat method in Scala is used to concatenate two strings to create a third string. Strings in Scala are immutable and cannot be changed once defined, so concatenation helps us overcome this by letting us join two strings together.

How do I merge two DataFrames with different columns in spark?

In PySpark to merge two DataFrames with different columns, will use the similar approach explain above and uses unionByName() transformation. First let's create DataFrame's with different number of columns. Now add missing columns ' state ' and ' salary ' to df1 and ' age ' to df2 with null values.


1 Answers

Simply use the concat command in combination with lit. lit will take a value and produce a column with only this value, it can be a string, double, etc.

val df2 = df.withColumn("Remarks", concat(lit("MCA "), col("id")))

Using the example dataframe in the question and running df2.show() gives

+----------+------+----------+
|        sk|    id|   Remarks|
+----------+------+----------+
|2306220722|117738|MCA 117738|
+----------+------+----------+
like image 114
Shaido Avatar answered Oct 05 '22 15:10

Shaido