Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split string IF delimiter is found

Consider the following data frame,

cols = ['Id', 'col1', 'col2', 'col3']
vals = [('1A','Not his side|:|This side', 'This side', 'Not this either|:|but this'),  
        ('1B','Keep this', 'This one|:|keep this', 'remove|:|keep that')]

dd1 = sqlContext.createDataFrame(vals, cols)

#+---+------------------------+--------------------+--------------------------+
#|Id |col1                    |col2                |col3                      |
#+---+------------------------+--------------------+--------------------------+
#|1A |Not his side|:|This side|This side           |Not this either|:|but this|
#|1B |Keep this               |This one|:|keep this|remove|:|keep that        |
#+---+------------------------+--------------------+--------------------------+

What I need to do is to split the strings at |:| and keep the second word. However, If the string does not contain the delimiter (|:|), then I get null, i.e.:

users1 = [F.split(F.col(x), "\\|:\\|").alias(x) for x in cols]
dd1.select(*users1).show()

#+---------+---------+---------+
#|     col1|     col2|     col3|
#+---------+---------+---------+
#|This side|     null| but this|
#|     null|keep this|keep that|
#+---------+---------+---------+

The result I'm looking for is:

+---+---------+---------+---------+
|Id |col1     |col2     |col3     |
+---+---------+---------+---------+
|1A |This side|This side|but this |
|1B |Keep this|keep this|keep that|
+---+---------+---------+---------+
like image 757
Sotos Avatar asked Apr 19 '26 22:04

Sotos


2 Answers

Use when and otherwise and check if the string contains "|:|". It can be done as follows:

cols = ['col1', 'col2', 'col3']

users1 = [F.when(F.col(x).contains("|:|"), F.split(F.col(x), "\\|:\\|")[1]).otherwise(F.col(x)).alias(x) for x in cols]
dd1.select(F.col('Id'), *users1)

Here the cols only include the columns that you want to split. The final select will include the Id column as well.

like image 54
Shaido Avatar answered Apr 21 '26 12:04

Shaido


You can use when and size inbuilt functions as

users1 = [F.when(F.size(F.split(F.col(x), "\\|:\\|")) > 1, F.split(F.col(x), "\\|:\\|")[1]).otherwise(F.col(x)).alias(x) for x in cols]
dd1.select(*users1).show()

which should give you

+---+---------+---------+---------+
| Id|     col1|     col2|     col3|
+---+---------+---------+---------+
| 1A|This side|This side| but this|
| 1B|Keep this|keep this|keep that|
+---+---------+---------+---------+

You can modify the answer so that you can use split function only once.

I hope the answer is helpful and should be a good hint on how to proceed.

like image 43
Ramesh Maharjan Avatar answered Apr 21 '26 11:04

Ramesh Maharjan