Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

splitting multiple values in one column into multiple rows R [duplicate]

Tags:

r

dplyr

tidyr

I have a data frame which for the most part is one observation per row. However, some rows have multiple values:

# A tibble: 3 x 2
          `number`   abilities
             <dbl>       <chr>
1               51       b1261
2               57        d710
3               57 b1301; d550

structure(list(`number` = c(51, 57, 57), abilities = c("b1261", 
"d710", "b1301; d550")), .Names = c("number", "abilities"
), row.names = c(NA, -3L), class = c("tbl_df", "tbl", "data.frame"
))

I'd like to get the following:

# A tibble: 3 x 2
          `number`   abilities
             <dbl>       <chr>
1               51       b1261
2               57        d710
3               57        d550
4               57       b1301

It's straight forward enough to split on the ; but I'm not sure how to easily add a new row, especially as abilities might contain more than 2 values.

This is very similar to: R semicolon delimited a column into rows but doesn't need to remove duplicates

like image 794
16 revs, 12 users 31% Avatar asked Jun 06 '17 23:06

16 revs, 12 users 31%


People also ask

How do I split a column into multiple rows in R?

Let us first use mutate and unnest to split a column into multiple rows. We started with two rows and the name column had two names separated by comma. We first split the name using strsplit as an argument to mutate function. Then we pass that to unnest to get them as separate rows.

How do you separate rows?

In the table, click the cell that you want to split. Click the Layout tab. In the Merge group, click Split Cells. In the Split Cells dialog, select the number of columns and rows that you want and then click OK.

How do I insert multiple rows in R?

The predefined function used to add multiple rows is rbind(). We have to pass a data frame and a vector having rows of data. So, let see the example code. If we want to extract multiple rows we can put row numbers in a vector and pass that vector as a row or column.


2 Answers

There's a function separate_rows in tidyr to do just that:

library(tidyr)
## The ";\\s+" means that the separator is a ";" followed by one or more spaces
separate_rows(df,abilities,sep=";\\s+")
  number abilities
   <dbl>     <chr>
1     51     b1261
2     57      d710
3     57     b1301
4     57      d550
like image 76
Lamia Avatar answered Sep 25 '22 18:09

Lamia


dplyr is good for this as it has unnest:

library(tidyverse)
library(stringr)
df %>%
    mutate(unpacked = str_split(abilities, ";")) %>%
    unnest %>%
    mutate(abilities = str_trim(unpacked))
like image 22
Marius Avatar answered Sep 21 '22 18:09

Marius