Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Insert a row of NAs after each group of data using data.table

Tags:

r

data.table

I am trying to add a row of NAs after each group of data in R.

A similar question has been asked earlier. Insert a blank row after each group of data.

The accepted answer works fine in this case too as follows.

group <- c("a","b","b","c","c","c","d","d","d","d")
xvalue <- c(16:25)
yvalue <- c(1:10)
df <- data.frame(cbind(group,xvalue,yvalue))
df_new <- as.data.frame(lapply(df, as.character), stringsAsFactors = FALSE)
head(do.call(rbind, by(df_new, df$group, rbind, NA)), -1 )
     group xvalue yvalue
a.1      a     16      1
a.2   <NA>   <NA>   <NA>
b.2      b     17      2
b.3      b     18      3
b.31  <NA>   <NA>   <NA>
c.4      c     19      4
c.5      c     20      5
c.6      c     21      6
c.41  <NA>   <NA>   <NA>
d.7      d     22      7
d.8      d     23      8
d.9      d     24      9
d.10     d     25     10

How can I speed this up using data.table for a large data.frame?

like image 242
Crops Avatar asked Jan 01 '15 11:01

Crops


1 Answers

You could try

df$group <- as.character(df$group)
setDT(df)[, .SD[1:(.N+1)], by=group][is.na(xvalue), group:=NA][!.N]
#     group xvalue yvalue
#1:     a     16      1
#2:    NA     NA     NA
#3:     b     17      2
#4:     b     18      3
#5:    NA     NA     NA
#6:     c     19      4
#7:     c     20      5
#8:     c     21      6
#9:    NA     NA     NA
#10:    d     22      7
#11:    d     23      8
#12:    d     24      9
#13:    d     25     10

Or as suggested by @David Arenburg

 setDT(df)[, indx := group][, .SD[1:(.N+1)], indx][,indx := NULL][!.N]

Or

 setDT(df)[df[,.I[1:(.N+1)], group]$V1][!.N]

Or it could be further simplified based on @eddi's comments

 setDT(df)[df[, c(.I, NA), group]$V1][!.N]
like image 176
akrun Avatar answered Oct 21 '22 05:10

akrun