Edit: This was marked as a duplicate. It is not. The question here is not only about splitting a single column into multiple ones, as my separate code would had worked. The main point of my question is splitting the column when the row string possess varying lengths of column output. I'm trying to turn this: <pre class="prettyprint"><code>data <- c("Place1-Place2-Place2-Place4-Place2-Place3-Place5", "Place7-Place7-Place7-Place7-Place7-Place7-Place7-Place7", "Place1-Place1-Place1-Place1-Place3-Place5", "Place1-Place4-Place2-Place3-Place3-Place5-Place5", "Place6-Place6", "Place1-Place2-Place3-Place4") </code></pre> Into this: <pre class="prettyprint"><code> X1 X2 X3 X4 X5 X6 X7 X8 1 Place1 Place2 Place2 Place4 Place2 Place3 Place5 2 Place7 Place7 Place7 Place7 Place7 Place7 Place7 Place7 3 Place1 Place1 Place1 Place1 Place3 Place5 4 Place1 Place4 Place2 Place3 Place3 Place5 Place5 5 Place6 Place6 6 Place1 Place2 Place3 Place4 </code></pre> I tried to use tidyr's seperate function using this code: <pre class="prettyprint"><code>library(data.table) data <- as.data.table(data) data_table <- tidyr::separate(data, data, sep="-", into = strsplit(data$data, "-"), fill = "right") </code></pre> Sadly I'm getting this error: <pre class="prettyprint"><code>Warning message: Too many values at 3 locations: 1, 2, 4 </code></pre> What do I need to change to make it work?

You specify the target columns correctly: <pre class="prettyprint"><code>library(tidyr) separate(DF, V1, paste0("X",1:8), sep="-") </code></pre> which gives: <pre class="prettyprint"><code> X1 X2 X3 X4 X5 X6 X7 X8 1 Place1 Place2 Place2 Place4 Place2 Place3 Place5 <NA> 2 Place7 Place7 Place7 Place7 Place7 Place7 Place7 Place7 3 Place1 Place1 Place1 Place1 Place3 Place5 <NA> <NA> 4 Place1 Place4 Place2 Place3 Place3 Place5 Place5 <NA> 5 Place6 Place6 <NA> <NA> <NA> <NA> <NA> <NA> 6 Place1 Place2 Place3 Place4 <NA> <NA> <NA> <NA> </code></pre> If you don't know how many target columns you need beforehand, you can use: <pre class="prettyprint"><code>> max(sapply(strsplit(as.character(DF$V1),'-'),length)) [1] 8 </code></pre> to extract the maximum number of parts (which is thus the number of columns you need). <hr> Several other methods: splitstackshape : <pre class="prettyprint"><code>library(splitstackshape) cSplit(DF, "V1", sep="-", direction = "wide") </code></pre> stringi : <pre class="prettyprint"><code>library(stringi) as.data.frame(stri_list2matrix(stri_split_fixed(DF$V1, "-"), byrow = TRUE)) </code></pre> data.table : <pre class="prettyprint"><code>library(data.table) setDT(DF)[, paste0("v", 1:8) := tstrsplit(V1, "-")][, V1 := NULL][] </code></pre> stringr : <pre class="prettyprint"><code>library(stringr) as.data.frame(str_split_fixed(DF$V1, "-",8)) </code></pre> which all give a similar result. <hr> Used data: <pre class="prettyprint"><code>DF <- data.frame(V1=c("Place1-Place2-Place2-Place4-Place2-Place3-Place5", "Place7-Place7-Place7-Place7-Place7-Place7-Place7-Place7", "Place1-Place1-Place1-Place1-Place3-Place5", "Place1-Place4-Place2-Place3-Place3-Place5-Place5", "Place6-Place6", "Place1-Place2-Place3-Place4")) </code></pre>

Splitting rows with uneven string length into columns in R using tidyr [duplicate]

Tags:

string

regex

r

tidyr

Edit: This was marked as a duplicate. It is not. The question here is not only about splitting a single column into multiple ones, as my separate code would had worked. The main point of my question is splitting the column when the row string possess varying lengths of column output.

I'm trying to turn this:

data <- c("Place1-Place2-Place2-Place4-Place2-Place3-Place5",
          "Place7-Place7-Place7-Place7-Place7-Place7-Place7-Place7",
          "Place1-Place1-Place1-Place1-Place3-Place5",
          "Place1-Place4-Place2-Place3-Place3-Place5-Place5",
          "Place6-Place6",
          "Place1-Place2-Place3-Place4")

Into this:

      X1     X2     X3     X4     X5     X6     X7     X8
1 Place1 Place2 Place2 Place4 Place2 Place3 Place5 
2 Place7 Place7 Place7 Place7 Place7 Place7 Place7 Place7
3 Place1 Place1 Place1 Place1 Place3 Place5 
4 Place1 Place4 Place2 Place3 Place3 Place5 Place5 
5 Place6 Place6 
6 Place1 Place2 Place3 Place4

I tried to use tidyr's seperate function using this code:

library(data.table)
data <- as.data.table(data)
data_table <- tidyr::separate(data,
                            data,
                            sep="-",
                            into = strsplit(data$data, "-"),
                            fill = "right")

Sadly I'm getting this error:

Warning message:
Too many values at 3 locations: 1, 2, 4

What do I need to change to make it work?

235

asked Mar 03 '16 12:03

JnrfL

1 Answers

You specify the target columns correctly:

library(tidyr)
separate(DF, V1, paste0("X",1:8), sep="-")

which gives:

      X1     X2     X3     X4     X5     X6     X7     X8
1 Place1 Place2 Place2 Place4 Place2 Place3 Place5   <NA>
2 Place7 Place7 Place7 Place7 Place7 Place7 Place7 Place7
3 Place1 Place1 Place1 Place1 Place3 Place5   <NA>   <NA>
4 Place1 Place4 Place2 Place3 Place3 Place5 Place5   <NA>
5 Place6 Place6   <NA>   <NA>   <NA>   <NA>   <NA>   <NA>
6 Place1 Place2 Place3 Place4   <NA>   <NA>   <NA>   <NA>

If you don't know how many target columns you need beforehand, you can use:

> max(sapply(strsplit(as.character(DF$V1),'-'),length))
[1] 8

to extract the maximum number of parts (which is thus the number of columns you need).

Several other methods:

splitstackshape :

library(splitstackshape)
cSplit(DF, "V1", sep="-", direction = "wide")

stringi :

library(stringi)
as.data.frame(stri_list2matrix(stri_split_fixed(DF$V1, "-"), byrow = TRUE))

data.table :

library(data.table)
setDT(DF)[, paste0("v", 1:8) := tstrsplit(V1, "-")][, V1 := NULL][]

stringr :

library(stringr)
as.data.frame(str_split_fixed(DF$V1, "-",8))

which all give a similar result.

Used data:

DF <- data.frame(V1=c("Place1-Place2-Place2-Place4-Place2-Place3-Place5",
                      "Place7-Place7-Place7-Place7-Place7-Place7-Place7-Place7",
                      "Place1-Place1-Place1-Place1-Place3-Place5",
                      "Place1-Place4-Place2-Place3-Place3-Place5-Place5",
                      "Place6-Place6",
                      "Place1-Place2-Place3-Place4"))

170

answered Sep 21 '22 10:09

Jaap

Related questions
                            
                                How can I replace multiple strings within a string without overlapping results?
                            
                                How to extract doubles or integers from a string java
                            
                                How to match '+abc' but not '++abc' without lookbehind?
                            
                                How can I express a multi-line regex in assertRegex in Python 3?
                            
                                How do I highlight text in a string that contains emojis in Swift?
                            
                                awk field separator with regexp lookahead or lookbehind
                            
                                .htaccess only allow access to index.php and a directory
                            
                                Why is this regex matching also words within a non-capturing group?
                            
                                PHP RegExp for url string
                            
                                Search filenames with regex
                            
                                Use lapply to plot data in a list and use names of list elements as plot titles [duplicate]
                            
                                Regex to replace character with character itself and hyphen
                            
                                How to find whether specific number of continuous consecutive numbers are contains in a string using javascript?
                            
                                Explode string when not between ()
                            
                                Why does using .html() break this Replace expression?
                            
                                converting the data with regexp in oracle sql
                            
                                Regex extraction data before vs after comma in R
                            
                                create regex to match format of 00:00:00 for duration (not time)
                            
                                Escaping special characters for JSON output
                            
                                Why this code stuck node.js - Bug on Javascript?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With