Trying to split a column in an R data frame that has more than one space in the variable, but I want to split on just the first space. An example data frame:
df <- data.frame(game = c(1, 2, 3, 4, 5, 6), date = c("Monday Apr 3", "Tuesday Apr 4", "Wednesday Apr 5", "Thursday Apr 6", "Friday Apr 7", "Saturday Apr 8"))
I'm trying to use tidyr to split the df 'date' column on just the first space so that the day is in its own column:
game day date
1 1 Monday Apr 3
2 2 Tuesday Apr 4
3 3 Wednesday Apr 5
4 4 Thursday Apr 6
5 5 Friday Apr 7
6 6 Saturday Apr 8
The above is the problem. The below is what I've tried and what is going wrong.
By the tidyr documentation, the default value of 'sep' is 'a regular expression that matches any sequence of non-alphanumeric values.' So if I just do:
df %>% separate(date, c("day", "date"))
That will split on the space but it splits on both spaces(e.g. the space after 'Monday' and the space after 'Apr' in 'Monday Apr 3'). The result is:
game day date
1 1 Monday Apr
2 2 Tuesday Apr
3 3 Wednesday Apr
4 4 Thursday Apr
5 5 Friday Apr
6 6 Saturday Apr
Warning message:
Too many values at 6 locations: 1, 2, 3, 4, 5, 6
I can add the regex to select just the first space (and I checked that this regex worked in Sublime Text):
df %>% separate(date, c("day", "date"), sep='^[^\\s]*\\K\\s')
But that gives me:
game day date
1 1 Monday Apr 3 <NA>
2 2 Tuesday Apr 4 <NA>
3 3 Wednesday Apr 5 <NA>
4 4 Thursday Apr 6 <NA>
5 5 Friday Apr 7 <NA>
6 6 Saturday Apr 8 <NA>
Warning message:
Too few values at 6 locations: 1, 2, 3, 4, 5, 6
So what is going wrong? Or how do I make this work? Or what obvious thing am I not understanding?
You need to specify the extra
parameter to be merge
:
library(tidyr)
df %>% separate(date, c("day", "date"), extra = "merge")
# game day date
#1 1 Monday Apr 3
#2 2 Tuesday Apr 4
#3 3 Wednesday Apr 5
#4 4 Thursday Apr 6
#5 5 Friday Apr 7
#6 6 Saturday Apr 8
We can do this easily using base R
cbind(df[1], read.csv(text=sub("\\s+", ",", df$date),
header=FALSE, col.names = c("day", "date")))
# game day date
#1 1 Monday Apr 3
#2 2 Tuesday Apr 4
#3 3 Wednesday Apr 5
#4 4 Thursday Apr 6
#5 5 Friday Apr 7
#6 6 Saturday Apr 8
Or another option is extract
from tidyr
library(tidyr)
extract(df, date, into = c("day", "date"), "(\\S+)\\s+(.*)")
# game day date
#1 1 Monday Apr 3
#2 2 Tuesday Apr 4
#3 3 Wednesday Apr 5
#4 4 Thursday Apr 6
#5 5 Friday Apr 7
#6 6 Saturday Apr 8
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With