Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

splitting a column by factor within a data frame

Suppose I have a data frame like this:

v1   v2   v3
a    1    a
a    2    b
a    6    c
b    3    a
b    4    b
b    5    c

Where v1 is a factor, and v3 is a character. I want to apply some function to the data frame such v2 is split across v1 and then included in the data frame:

v1   v2   v3   v4   v5
a    1    a    1    NA
a    2    b    2    NA
a    6    c    6    NA
b    3    a    NA   3
b    4    b    NA   4
b    5    c    NA   5

The solutions I have been able to work out are very convoluted. Is there an elegant way of doing this?

(Note: v3 exists because any solution needs to be able to deal with the existence of other non-numeric vectors in the data frame that should be ignored.)

like image 948
Logister Avatar asked Apr 15 '26 04:04

Logister


1 Answers

1) transform / ifelse A simple approach if there are a small known number of values in v1 is to manually generate each new column:

transform(DF, a = ifelse(v1 == "a", v2, NA), 
              b = ifelse(v1 == "b", v2, NA))

2) tapply A more general approach would be:

cbind(DF, tapply(DF$v2, list(1:nrow(DF), DF$v1), identity))

The solutions above do not require any addon packages.

3) data.table. This solution assumes that v1 is a factor and that the rows of DF are unique (as is the case in the question):

# devtools::install_github("Rdatatable/datatable")  # 1.9.3

library(data.table)
DT <- data.table(DF)

DT[, split(v2, v1), by = DT]

If the rows of DT might not be unique then (based on discussion with Arun) this would work:

DT[, c(.SD, split(v2, v1)), by = 1:nrow(DT)][, -1, with = FALSE]

Update Some improvements.

like image 182
G. Grothendieck Avatar answered Apr 21 '26 03:04

G. Grothendieck