For whatever reason data is being provided in the following format:
0001 This is text for 0001
0002 This has spaces in between
0003 Yet this is only supposed to be two columns
0009 Why didn't they just comma delimit you may ask?
0010 Or even use quotations?
001 Who knows
0012 But now I'm here with his file
0013 And hoping someone has an elegant solution?
So the above is supposed to be two columns. What I would like to have is a column for the first entries, ie 0001,0002,0003,0009,0010,001,0012,0013
and another column for everything else.
You can use the separate
function from the tidyr package for that (promoting my comment to an answer). You specify two column-names and with the extra = "merge"
parameter you make sure that everything after the first space is put into the second column:
library(tidyr)
separate(mydf, V1, c("nr","text"), sep = " ", extra = "merge")
# or:
mydf %>% separate(V1, c("nr","text"), sep = " ", extra = "merge")
you get:
nr text
1 0001 This is text for 0001
2 0002 This has spaces in between
3 0003 Yet this is only supposed to be two columns
4 0009 Why didnt they just comma delimit you may ask?
5 0010 Or even use quotations?
6 001 Who knows
7 0012 But now Im here with his file
8 0013 And hoping someone has an elegant solution?
Used data:
mydf <- structure(list(V1 = structure(c(1L, 2L, 3L, 4L, 6L, 5L, 7L, 8L),
.Label = c("0001 This is text for 0001", "0002 This has spaces in between",
"0003 Yet this is only supposed to be two columns", "0009 Why didnt they just comma delimit you may ask?",
"001 Who knows", "0010 Or even use quotations?", "0012 But now Im here with his file", "0013 And hoping someone has an elegant solution?"), class = "factor")),
.Names = "V1", class = "data.frame", row.names = c(NA,-8L))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With