Let's say I have a dataframe: <pre class="prettyprint"><code>df <- data.frame(group = c('A','A','A','B','B','B'), time = c(1,2,4,1,2,3), data = c(5,6,7,8,9,10)) </code></pre> What I want to do is insert data into the data frame where it was missing in the sequence. So in the above example, I'm missing data for <code>time</code> = 3 for group A, and <code>time</code> = 4 for Group B. I would essentially want to put 0's in the place of the <code>data</code> column. How would I go about adding these additional rows? The goal would be: <pre class="prettyprint"><code>df <- data.frame(group = c('A','A','A','A','B','B','B','B'), time = c(1,2,3,4,1,2,3,4), data = c(5,6,0,7,8,9,10,0)) </code></pre> My real data is a couple thousand data points, so manually doing so isn't possible.

You can try <code>merge/expand.grid</code> <pre class="prettyprint"><code> res <- merge( expand.grid(group=unique(df$group), time=unique(df$time)), df, all=TRUE) res$data[is.na(res$data)] <- 0 res # group time data #1 A 1 5 #2 A 2 6 #3 A 3 0 #4 A 4 7 #5 B 1 8 #6 B 2 9 #7 B 3 10 #8 B 4 0 </code></pre> Or using <code>data.table</code> <pre class="prettyprint"><code> library(data.table) setkey(setDT(df), group, time)[CJ(group=unique(group), time=unique(time)) ][is.na(data), data:=0L] # group time data #1: A 1 5 #2: A 2 6 #3: A 3 0 #4: A 4 7 #5: B 1 8 #6: B 2 9 #7: B 3 10 #8: B 4 0 </code></pre> <h3>Update</h3> As @thelatemail mentioned in the comments, the above method would fail if a particular 'time' value is not present in all the groups. May be this would be more general. <pre class="prettyprint"><code> res <- merge( expand.grid(group=unique(df$group), time=min(df$time):max(df$time)), df, all=TRUE) res$data[is.na(res$data)] <- 0 </code></pre> and similarly replace <code>time=unique(time)</code> with <code>time= min(time):max(time)</code> in the data.table solution.

Insert missing time rows into a dataframe

Tags:

r

missing-data

time-series

Let's say I have a dataframe:

df <- data.frame(group = c('A','A','A','B','B','B'), 
                 time = c(1,2,4,1,2,3),
                 data = c(5,6,7,8,9,10))

What I want to do is insert data into the data frame where it was missing in the sequence. So in the above example, I'm missing data for time = 3 for group A, and time = 4 for Group B. I would essentially want to put 0's in the place of the data column.

How would I go about adding these additional rows?

The goal would be:

df <- data.frame(group = c('A','A','A','A','B','B','B','B'), 
                 time = c(1,2,3,4,1,2,3,4),
                 data = c(5,6,0,7,8,9,10,0))

My real data is a couple thousand data points, so manually doing so isn't possible.

305

asked Jun 30 '15 23:06

puginablanket

1 Answers

You can try merge/expand.grid

 res <- merge(
          expand.grid(group=unique(df$group), time=unique(df$time)),
                                     df, all=TRUE)
 res$data[is.na(res$data)] <- 0
 res
 #  group time data
 #1     A    1    5
 #2     A    2    6
 #3     A    3    0
 #4     A    4    7
 #5     B    1    8
 #6     B    2    9
 #7     B    3   10
 #8     B    4    0

Or using data.table

 library(data.table)
 setkey(setDT(df), group, time)[CJ(group=unique(group), time=unique(time))
                     ][is.na(data), data:=0L]
 #    group time data
 #1:     A    1    5
 #2:     A    2    6
 #3:     A    3    0
 #4:     A    4    7
 #5:     B    1    8
 #6:     B    2    9
 #7:     B    3   10
 #8:     B    4    0

Update

As @thelatemail mentioned in the comments, the above method would fail if a particular 'time' value is not present in all the groups. May be this would be more general.

 res <- merge(
          expand.grid(group=unique(df$group), 
                      time=min(df$time):max(df$time)),
                                     df, all=TRUE)
 res$data[is.na(res$data)] <- 0

and similarly replace time=unique(time) with time= min(time):max(time) in the data.table solution.

124

answered Oct 29 '22 12:10

akrun

Related questions
                            
                                How to best join one column of a data.table with another column of the same data.table?
                            
                                Collecting out-of-fold predictions from a caret model
                            
                                Color Gradients With ggplot
                            
                                Make Sweave or knitr put graphics suffix in `\includegraphics{}`
                            
                                Free scale, but same per-panel range in x/y
                            
                                rbundler build error: "cannot open file 'startup.Rs': No such file or directory"
                            
                                To add new value in every element in list in R?
                            
                                Apply a function to each layer of a 3d array, returning an array
                            
                                Move a value to a different environment
                            
                                Sonatype Nexus proxy for CRAN packages?
                            
                                Generalized Reduced Gradient (GRG2) Algorithm in R
                            
                                R - Plot a region described by planes with rgl
                            
                                Difference between "SOCK", "PVM", "MPI", and "NWS" for the R SNOW package
                            
                                nested browser calls -- exiting only a single context
                            
                                animation package cannot find ImageMagick with convert = "convert"
                            
                                na.locf converts data from numeric to character
                            
                                Using dplyr's do to perform bootstrap replications
                            
                                Create all possible combiations of 0,1, or 2 "1"s of a binary vector of length n
                            
                                How to spread out community graph made by using igraph package in R
                            
                                lm function in R does not give coefficients for all factor levels in categorical data

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With