Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Processing a variable space delimited file limited into 2 columns

Tags:

r

For whatever reason data is being provided in the following format:

0001 This is text for 0001
0002 This has spaces in between
0003 Yet this is only supposed to be two columns
0009 Why didn't they just comma delimit you may ask?
0010 Or even use quotations?
001  Who knows
0012 But now I'm here with his file
0013 And hoping someone has an elegant solution?

So the above is supposed to be two columns. What I would like to have is a column for the first entries, ie 0001,0002,0003,0009,0010,001,0012,0013 and another column for everything else.

like image 476
Cenoc Avatar asked Feb 08 '23 00:02

Cenoc


1 Answers

You can use the separate function from the tidyr package for that (promoting my comment to an answer). You specify two column-names and with the extra = "merge" parameter you make sure that everything after the first space is put into the second column:

library(tidyr)
separate(mydf, V1, c("nr","text"), sep = " ", extra = "merge")
# or:
mydf %>% separate(V1, c("nr","text"), sep = " ", extra = "merge")

you get:

    nr                                           text
1 0001                          This is text for 0001
2 0002                     This has spaces in between
3 0003    Yet this is only supposed to be two columns
4 0009 Why didnt they just comma delimit you may ask?
5 0010                        Or even use quotations?
6  001                                      Who knows
7 0012                  But now Im here with his file
8 0013    And hoping someone has an elegant solution?

Used data:

mydf <- structure(list(V1 = structure(c(1L, 2L, 3L, 4L, 6L, 5L, 7L, 8L), 
                                      .Label = c("0001 This is text for 0001", "0002 This has spaces in between",
                                                 "0003 Yet this is only supposed to be two columns", "0009 Why didnt they just comma delimit you may ask?", 
                                                 "001  Who knows", "0010 Or even use quotations?", "0012 But now Im here with his file", "0013 And hoping someone has an elegant solution?"), class = "factor")), 
              .Names = "V1", class = "data.frame", row.names = c(NA,-8L))
like image 95
Jaap Avatar answered Feb 16 '23 02:02

Jaap