Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Text-to-column equivalent in R, splitting dataframe on character

Tags:

r

I'd like to know how to split columns in a similar way that excel does in the "text-to-column" feature. There are many tutorials on stackexchange about how to split columns by a character, but they don't address 3 things I need:

1). work with a column, where only some of the rows have the character 2). work with a dataframe that has many columns 3). treat columns as characters/factors

For instance, I have a dataframe

    df <- data.frame(V1 = c("01, 02", "04", "05, 06", "07, 08", "09", "10"),
         V2 = c("11, 12", "14", "13, 14", 11, 14", "13", "15")

If i were to use text-to-columns out of V1 in excel, I would end up with 3 columns split on the comma. A second column would be created for only those cells which had a comma in the them. There would be blank cells for rows which had no column. I would also have the option of treating the new column as a number or text. In this case, I need the leading zero, so it should be treated as text.

It would look something like this

           V1    V2   V3      
    Row 1   01   02   11,12
    Row 2   04   NA   14

How would I do something similar in R, keeping in mind that the dataset I have has many columns, so its not practical to rename every single column in the code.

I hope this was clear. Thank you for the help!

like image 319
tom Avatar asked Dec 12 '14 03:12

tom


2 Answers

May be this helps

library(splitstackshape)
cSplit(df, 'V1', sep=", ", type.convert=FALSE)
#       V2 V1_1 V1_2
#1: 11, 12   01   02
#2:     14   04   NA
#3: 13, 14   05   06
#4: 11, 14   07   08
#5:     13   09   NA
#6:     15   10   NA

If you want both columns to be split

cSplit(df, 1:ncol(df), sep=",", stripWhite=TRUE, type.convert=FALSE)
#    V1_1 V1_2 V2_1 V2_2
#1:   01   02   11   12
#2:   04   NA   14   NA
#3:   05   06   13   14
#4:   07   08   11   14
#5:   09   NA   13   NA
#6:   10   NA   15   NA

The default is type.convert= TRUE, which would convert to numeric.

data

 df <- data.frame(V1 = c("01, 02", "04", "05, 06", "07, 08", "09", "10"),
      V2 = c("11, 12", "14", "13, 14", "11, 14", "13", "15") )
like image 147
akrun Avatar answered Oct 02 '22 02:10

akrun


Splitting with strsplit and then accessing with "[" seems to work. You do realize those were factors to begin with I hope?

spl <-strsplit(as.character(df$V1), ",")
data.frame(V1= sapply(spl, "[", 1), V2 = sapply(spl, "[", 2), df$V2)
  V1   V2  df.V2
1 01   02 11, 12
2 04 <NA>     14
3 05   06 13, 14
4 07   08 11, 14
5 09 <NA>     13
6 10 <NA>     15
like image 33
IRTFM Avatar answered Oct 02 '22 02:10

IRTFM