I have the following data and would like to get the result with a text prefix:
Input dataframe:
sk id
2306220722 117738
Current code:
df.withColumn("Remarks", concat_ws("MCA", col("ID")))
Expected output:
sk id Remarks
2306220722 117738 MCA 117738
I would like to prefix the id
column with "MCA" and add the resulting string to the Remarks
column.
PySpark Concatenate Using concat() select() is a transformation function in PySpark and returns a new DataFrame with the selected columns. In the above example, using concat() function of Pyspark SQL, I have concatenated three input string columns(firstname, middlename, lastname) into a single string column(FullName).
The PySpark SQL provides the split() function to convert delimiter separated String to an Array (StringType to ArrayType) column on DataFrame It can be done by splitting the string column on the delimiter like space, comma, pipe, etc.
The concat method in Scala is used to concatenate two strings to create a third string. Strings in Scala are immutable and cannot be changed once defined, so concatenation helps us overcome this by letting us join two strings together.
In PySpark to merge two DataFrames with different columns, will use the similar approach explain above and uses unionByName() transformation. First let's create DataFrame's with different number of columns. Now add missing columns ' state ' and ' salary ' to df1 and ' age ' to df2 with null values.
Simply use the concat
command in combination with lit
. lit
will take a value and produce a column with only this value, it can be a string, double, etc.
val df2 = df.withColumn("Remarks", concat(lit("MCA "), col("id")))
Using the example dataframe in the question and running df2.show()
gives
+----------+------+----------+
| sk| id| Remarks|
+----------+------+----------+
|2306220722|117738|MCA 117738|
+----------+------+----------+
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With