Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Splitting rows with uneven string length into columns in R using tidyr [duplicate]

Edit: This was marked as a duplicate. It is not. The question here is not only about splitting a single column into multiple ones, as my separate code would had worked. The main point of my question is splitting the column when the row string possess varying lengths of column output.

I'm trying to turn this:

data <- c("Place1-Place2-Place2-Place4-Place2-Place3-Place5",
          "Place7-Place7-Place7-Place7-Place7-Place7-Place7-Place7",
          "Place1-Place1-Place1-Place1-Place3-Place5",
          "Place1-Place4-Place2-Place3-Place3-Place5-Place5",
          "Place6-Place6",
          "Place1-Place2-Place3-Place4")

Into this:

      X1     X2     X3     X4     X5     X6     X7     X8
1 Place1 Place2 Place2 Place4 Place2 Place3 Place5 
2 Place7 Place7 Place7 Place7 Place7 Place7 Place7 Place7
3 Place1 Place1 Place1 Place1 Place3 Place5 
4 Place1 Place4 Place2 Place3 Place3 Place5 Place5 
5 Place6 Place6 
6 Place1 Place2 Place3 Place4

I tried to use tidyr's seperate function using this code:

library(data.table)
data <- as.data.table(data)
data_table <- tidyr::separate(data,
                            data,
                            sep="-",
                            into = strsplit(data$data, "-"),
                            fill = "right")

Sadly I'm getting this error:

Warning message:
Too many values at 3 locations: 1, 2, 4 

What do I need to change to make it work?

like image 235
JnrfL Avatar asked Mar 03 '16 12:03

JnrfL


People also ask

How do I split a row into multiple columns in R?

To split a column into multiple columns in the R Language, we use the separator() function of the dplyr package library. The separate() function separates a character column into multiple columns with a regular expression or numeric locations.

How do you separate rows?

In the table, click the cell that you want to split. Click the Layout tab. In the Merge group, click Split Cells. In the Split Cells dialog, select the number of columns and rows that you want and then click OK.

How do I split a column into multiple rows?

Click in a cell, or select multiple cells that you want to split. Under Table Tools, on the Layout tab, in the Merge group, click Split Cells. Enter the number of columns or rows that you want to split the selected cells into.


1 Answers

You specify the target columns correctly:

library(tidyr)
separate(DF, V1, paste0("X",1:8), sep="-")

which gives:

      X1     X2     X3     X4     X5     X6     X7     X8
1 Place1 Place2 Place2 Place4 Place2 Place3 Place5   <NA>
2 Place7 Place7 Place7 Place7 Place7 Place7 Place7 Place7
3 Place1 Place1 Place1 Place1 Place3 Place5   <NA>   <NA>
4 Place1 Place4 Place2 Place3 Place3 Place5 Place5   <NA>
5 Place6 Place6   <NA>   <NA>   <NA>   <NA>   <NA>   <NA>
6 Place1 Place2 Place3 Place4   <NA>   <NA>   <NA>   <NA>

If you don't know how many target columns you need beforehand, you can use:

> max(sapply(strsplit(as.character(DF$V1),'-'),length))
[1] 8

to extract the maximum number of parts (which is thus the number of columns you need).


Several other methods:

splitstackshape :

library(splitstackshape)
cSplit(DF, "V1", sep="-", direction = "wide")

stringi :

library(stringi)
as.data.frame(stri_list2matrix(stri_split_fixed(DF$V1, "-"), byrow = TRUE))

data.table :

library(data.table)
setDT(DF)[, paste0("v", 1:8) := tstrsplit(V1, "-")][, V1 := NULL][]

stringr :

library(stringr)
as.data.frame(str_split_fixed(DF$V1, "-",8))

which all give a similar result.


Used data:

DF <- data.frame(V1=c("Place1-Place2-Place2-Place4-Place2-Place3-Place5",
                      "Place7-Place7-Place7-Place7-Place7-Place7-Place7-Place7",
                      "Place1-Place1-Place1-Place1-Place3-Place5",
                      "Place1-Place4-Place2-Place3-Place3-Place5-Place5",
                      "Place6-Place6",
                      "Place1-Place2-Place3-Place4"))
like image 170
Jaap Avatar answered Sep 21 '22 10:09

Jaap