Consider the following data frame,
cols = ['Id', 'col1', 'col2', 'col3']
vals = [('1A','Not his side|:|This side', 'This side', 'Not this either|:|but this'),
('1B','Keep this', 'This one|:|keep this', 'remove|:|keep that')]
dd1 = sqlContext.createDataFrame(vals, cols)
#+---+------------------------+--------------------+--------------------------+
#|Id |col1 |col2 |col3 |
#+---+------------------------+--------------------+--------------------------+
#|1A |Not his side|:|This side|This side |Not this either|:|but this|
#|1B |Keep this |This one|:|keep this|remove|:|keep that |
#+---+------------------------+--------------------+--------------------------+
What I need to do is to split the strings at |:| and keep the second word. However, If the string does not contain the delimiter (|:|), then I get null,
i.e.:
users1 = [F.split(F.col(x), "\\|:\\|").alias(x) for x in cols]
dd1.select(*users1).show()
#+---------+---------+---------+
#| col1| col2| col3|
#+---------+---------+---------+
#|This side| null| but this|
#| null|keep this|keep that|
#+---------+---------+---------+
The result I'm looking for is:
+---+---------+---------+---------+
|Id |col1 |col2 |col3 |
+---+---------+---------+---------+
|1A |This side|This side|but this |
|1B |Keep this|keep this|keep that|
+---+---------+---------+---------+
Use when and otherwise and check if the string contains "|:|". It can be done as follows:
cols = ['col1', 'col2', 'col3']
users1 = [F.when(F.col(x).contains("|:|"), F.split(F.col(x), "\\|:\\|")[1]).otherwise(F.col(x)).alias(x) for x in cols]
dd1.select(F.col('Id'), *users1)
Here the cols only include the columns that you want to split. The final select will include the Id column as well.
You can use when and size inbuilt functions as
users1 = [F.when(F.size(F.split(F.col(x), "\\|:\\|")) > 1, F.split(F.col(x), "\\|:\\|")[1]).otherwise(F.col(x)).alias(x) for x in cols]
dd1.select(*users1).show()
which should give you
+---+---------+---------+---------+
| Id| col1| col2| col3|
+---+---------+---------+---------+
| 1A|This side|This side| but this|
| 1B|Keep this|keep this|keep that|
+---+---------+---------+---------+
You can modify the answer so that you can use split function only once.
I hope the answer is helpful and should be a good hint on how to proceed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With