I don't really know how to achieve this without using a for loop:
x <- c('a', 'b', 'c', 'd')
> x
[1] "a" "b" "c" "d"
data <- data.frame(
x=c('a', 'b', 'a', 'b', 'c', 'a', 'a', 'b', 'c', 'd'),
name=c('one','one', 'two','two','two', 'three', 'four','four','four','four'),
other=c(1, 4, 5, 3, 2, 4, 5, 6, 3, 2)
)
> data
x name other
1 a one 1
2 b one 4
3 a two 5
4 b two 3
5 c two 2
6 a three 4
7 a four 5
8 b four 6
9 c four 3
10 d four 2
I would like to split data by the value of name and merge every subgroup with x to fill the "missing rows", getting something like this:
> data
x name other
1 a one 1
2 b one 4
c one 0 <- missing row added
d one 0 <- missing row added
3 a two 5
4 b two 3
5 c two 2
d two 0 <- missing row added
6 a three 4
b three 0 <- missing row added
c three 0 <- missing row added
d three 0 <- missing row added
7 a four 5
8 b four 6
9 c four 3
10 d four 2
And finally, reformatting the data.frame like this:
> data
x one two three four
1 a 1 5 4 5
2 b 4 3 0 6
3 c 0 2 0 3
4 d 0 0 0 2
I can achieve it using a for loop, but I am sure there has to be a better solution with *apply, by, split or something like that. Any suggestions?
** UPDATE **
I finally used a little modification to the accepted answer (tnx again, dude!), since I don't really like working with levels and I don't care the order of the columns:
grid <- expand.grid(x, unique(data$name))
colnames(grid) <- c("x", "name")
data <- merge(grid, data, all.x = TRUE)
data[is.na(data)] <- 0
dcast(data, x ~ name, value.var = 'other')
Try xtabs. No packages are needed.
First put the levels of name in order so the columns come out sorted:
data$name <- factor(data$name, levels = c("one", "two", "three", "four"))
tab <- xtabs(other ~., data)
giving this c("xtabs", "table") class output:
> tab
name
x one two three four
a 1 5 4 5
b 4 3 0 6
c 0 2 0 3
d 0 0 0 2
or use as.data.frame.matrix(tab) if output having "data.frame" class is desired.
All you really need is reshape2::dcast:
# clean up factor levels for prettier results
data$name <- factor(data$name, levels = c('one', 'two', 'three', 'four'))
library(reshape2)
dcast(data, x ~ name, value.var = 'other', fill = 0)
# x one two three four
# 1 a 1 5 4 5
# 2 b 4 3 0 6
# 3 c 0 2 0 3
# 4 d 0 0 0 2
To follow the steps you lay out, first use expand.grid to get the combinations, then merge with all = TRUE, then use reshape2::dcast to rearrange:
df <- merge(data, expand.grid(x, levels(data$name)),
by.x = c('x', 'name'), by.y = c('Var1', 'Var2'), all = TRUE)
df[is.na(df)] <- 0 # replace `NA`s with 0
df$name <- factor(df$name, levels = c('one', 'two', 'three', 'four')) # fix order of levels
library(reshape2)
dcast(df, x ~ name, value.var = 'other')
# x one two three four
# 1 a 1 5 4 5
# 2 b 4 3 0 6
# 3 c 0 2 0 3
# 4 d 0 0 0 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With