Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to split data.frame -> apply merge to subsets -> combine into data.frame

Tags:

r

I don't really know how to achieve this without using a for loop:

x <- c('a', 'b', 'c', 'd')

> x
[1] "a" "b" "c" "d"

data <- data.frame(
   x=c('a', 'b', 'a', 'b', 'c', 'a', 'a', 'b', 'c', 'd'),
   name=c('one','one', 'two','two','two', 'three', 'four','four','four','four'),
   other=c(1, 4, 5, 3, 2, 4, 5, 6, 3, 2)
)

> data
   x  name other
1  a   one     1
2  b   one     4
3  a   two     5
4  b   two     3
5  c   two     2
6  a three     4
7  a  four     5
8  b  four     6
9  c  four     3
10 d  four     2

I would like to split data by the value of name and merge every subgroup with x to fill the "missing rows", getting something like this:

> data
   x  name other
1  a   one     1
2  b   one     4
   c   one     0 <- missing row added
   d   one     0 <- missing row added
3  a   two     5
4  b   two     3
5  c   two     2
   d   two     0 <- missing row added
6  a three     4
   b three     0 <- missing row added
   c three     0 <- missing row added
   d three     0 <- missing row added
7  a  four     5
8  b  four     6
9  c  four     3
10 d  four     2

And finally, reformatting the data.frame like this:

> data
   x  one  two  three  four
1  a    1    5      4     5
2  b    4    3      0     6
3  c    0    2      0     3
4  d    0    0      0     2

I can achieve it using a for loop, but I am sure there has to be a better solution with *apply, by, split or something like that. Any suggestions?

** UPDATE **

I finally used a little modification to the accepted answer (tnx again, dude!), since I don't really like working with levels and I don't care the order of the columns:

grid <- expand.grid(x, unique(data$name))
colnames(grid) <- c("x", "name")
data <- merge(grid, data, all.x = TRUE)
data[is.na(data)] <- 0
dcast(data, x ~ name, value.var = 'other')
like image 366
thelawnmowerman Avatar asked Jan 23 '26 16:01

thelawnmowerman


2 Answers

Try xtabs. No packages are needed.

First put the levels of name in order so the columns come out sorted:

data$name <- factor(data$name, levels = c("one", "two", "three", "four"))
tab <- xtabs(other ~., data)

giving this c("xtabs", "table") class output:

> tab
   name
x   one two three four
  a   1   5     4    5
  b   4   3     0    6
  c   0   2     0    3
  d   0   0     0    2

or use as.data.frame.matrix(tab) if output having "data.frame" class is desired.

like image 70
G. Grothendieck Avatar answered Jan 25 '26 08:01

G. Grothendieck


More direct:

All you really need is reshape2::dcast:

# clean up factor levels for prettier results
data$name <- factor(data$name, levels = c('one', 'two', 'three', 'four'))

library(reshape2)
dcast(data, x ~ name, value.var = 'other', fill = 0)

#   x one two three four
# 1 a   1   5     4    5
# 2 b   4   3     0    6
# 3 c   0   2     0    3
# 4 d   0   0     0    2

As asked:

To follow the steps you lay out, first use expand.grid to get the combinations, then merge with all = TRUE, then use reshape2::dcast to rearrange:

df <- merge(data, expand.grid(x, levels(data$name)), 
            by.x = c('x', 'name'), by.y = c('Var1', 'Var2'), all = TRUE)

df[is.na(df)] <- 0         # replace `NA`s with 0
df$name <- factor(df$name, levels = c('one', 'two', 'three', 'four')) # fix order of levels

library(reshape2)
dcast(df, x ~ name, value.var = 'other')

#    x one two three four
# 1 a   1   5     4    5
# 2 b   4   3     0    6
# 3 c   0   2     0    3
# 4 d   0   0     0    2
like image 41
alistaire Avatar answered Jan 25 '26 09:01

alistaire



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!