Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to split pipe-separated column into multiple rows?

I have a dataframe that contains the following:

movieId / movieName / genre
1         example1    action|thriller|romance
2         example2    fantastic|action

I would like to obtain a second dataframe (from the first one), that contains the following:

movieId / movieName / genre
1         example1    action
1         example1    thriller
1         example1    romance
2         example2    fantastic
2         example2    action

How I can do that?

like image 813
Lechucico Avatar asked May 14 '17 13:05

Lechucico


People also ask

How do I split a column into multiple rows?

Click in a cell, or select multiple cells that you want to split. Under Table Tools, on the Layout tab, in the Merge group, click Split Cells. Enter the number of columns or rows that you want to split the selected cells into.

How do I split a column into multiple rows in pandas?

To split text in a column into multiple rows with Python Pandas, we can use the str. split method. to create the df data frame.

How split comma separated values into rows pandas?

Series and DataFrame methods define a . explode() method that explodes lists into separate rows. See the docs section on Exploding a list-like column. Since you have a list of comma separated strings, split the string on comma to get a list of elements, then call explode on that column.

How do I split a row into multiple columns in R?

To split a column into multiple columns in the R Language, we use the separator() function of the dplyr package library. The separate() function separates a character column into multiple columns with a regular expression or numeric locations.


1 Answers

I'd use split standard function.

scala> movies.show(truncate = false)
+-------+---------+-----------------------+
|movieId|movieName|genre                  |
+-------+---------+-----------------------+
|1      |example1 |action|thriller|romance|
|2      |example2 |fantastic|action       |
+-------+---------+-----------------------+

scala> movies.withColumn("genre", explode(split($"genre", "[|]"))).show
+-------+---------+---------+
|movieId|movieName|    genre|
+-------+---------+---------+
|      1| example1|   action|
|      1| example1| thriller|
|      1| example1|  romance|
|      2| example2|fantastic|
|      2| example2|   action|
+-------+---------+---------+

// You can use \\| for split instead
scala> movies.withColumn("genre", explode(split($"genre", "\\|"))).show
+-------+---------+---------+
|movieId|movieName|    genre|
+-------+---------+---------+
|      1| example1|   action|
|      1| example1| thriller|
|      1| example1|  romance|
|      2| example2|fantastic|
|      2| example2|   action|
+-------+---------+---------+

p.s. You could use Dataset.flatMap to achieve the same result which is something Scala devs would enjoy more I'm sure.

like image 136
Jacek Laskowski Avatar answered Oct 18 '22 16:10

Jacek Laskowski