I have a data frame with, say, 5 rows, for 2 observables. I need to insert "dummy" or "zero" rows in the data frame so that number of rows per observable is the same (and can be bigger than N rows for longer one). E.g.:
# This is what I have:
x = c("a","a","b","b","b")
y = c(2,4,5,2,6)
dft = data.frame(x,y)
print(dft)
x y
1 a 2
2 a 4
3 b 5
4 b 2
5 b 6
Here's what I'd like to get, i.e. add N rows per observable to 4. Mock up df
x1 = c("a","a","a","a","b","b","b","b")
y1 = c(2,4,0,0,5,2,6,0)
dft1 = data.frame(x1,y1)
print(dft1)
x1 y1
1 a 2
2 a 4
3 a 0
4 a 0
5 b 5
6 b 2
7 b 6
8 b 0
I started with getting the N rows in original data frame per observable with ddply
, so that I know how many rows I need to add for each observable.
library(plyr)
nr = ddply(dft,.(x),summarise,val=length(x))
print(nr)
x val
1 a 2
2 b 3
# N extras will be 2 and 1 to reach 4 per obs.
repl = 4 - nr$val
repl_name = nr$x
repl_x = rep(repl_name,repl)
print(repl_x)
[1] a a b
Levels: a b
dfa = matrix("-",nrow=sum(repl),ncol=1)
dff = data.frame(repl_x,as.data.frame(dfa))
names(dff) <- names(dft)
dft = rbind(dft,dff)
dft = dft[order(as.character(dft$x)),]
print(dft)
x y
1 a 2
2 a 4
6 a -
7 a -
3 b 5
4 b 2
5 b 6
8 b -
I did achieve my goal, but in quite a few operations and transformations.
So, question - is there a simpler and faster way to insert arbitrary number of empty/dummy rows in several places in any data frame. Number of columns and rows can be any.
Note: the code above works, so I do believe this question is not a "review my code" type, but a genuine - "how to do it better" question. Thank you!
You can try using the "data.table" package which would let you use "length<-"
to expand out your rows.
Demo:
library(data.table)
as.data.table(dft)[, lapply(.SD, `length<-`, 4), by = x]
## x y z
## 1: a 2 2
## 2: a 4 3
## 3: a NA NA
## 4: a NA NA
## 5: b 5 4
## 6: b 2 5
## 7: b 6 6
## 8: b NA NA
Upon provocation by Thela-the-taunter™, if you want to stick with base R, perhaps you can create a function like the following:
naRowsByGroup <- function(indf, group, rowsneeded) {
do.call(rbind, lapply(split(indf, indf[[group]]), function(x) {
x <- data.frame(lapply(x, `length<-`, rowsneeded))
x[group] <- x[[group]][1]
x
}))
}
Usage would then be:
naRowsByGroup(dft, 1, 4)
# x y z
# 1 a 2 2
# 2 a 4 3
# 3 a NA NA
# 4 a NA NA
# 5 b 5 4
# 6 b 2 5
# 7 b 6 6
# 8 b NA NA
Sample data:
x = c("a","a","b","b","b")
y = c(2,4,5,2,6)
z = c(2,3,4,5,6)
dft = data.frame(x,y,z)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With