Add new row to dataframe, at specific row-index, not appended?

People also ask

How do I add rows to an existing DataFrame?

append() function is used to append rows of other dataframe to the end of the given dataframe, returning a new dataframe object. Columns not in the original dataframes are added as new columns and the new cells are populated with NaN value. ignore_index : If True, do not use the index labels.

How do I append multiple rows to a DataFrame in Python?

Add multiple rows to pandas dataframe We can pass a list of series too in the dataframe. append() for appending multiple rows in dataframe. For example, we can create a list of series with same column names as dataframe i.e. Now pass this list of series to the append() function i.e.

Here's a solution that avoids the (often slow) rbind call:

existingDF <- as.data.frame(matrix(seq(20),nrow=5,ncol=4))
r <- 3
newrow <- seq(4)
insertRow <- function(existingDF, newrow, r) {
  existingDF[seq(r+1,nrow(existingDF)+1),] <- existingDF[seq(r,nrow(existingDF)),]
  existingDF[r,] <- newrow
  existingDF
}

> insertRow(existingDF, newrow, r)
  V1 V2 V3 V4
1  1  6 11 16
2  2  7 12 17
3  1  2  3  4
4  3  8 13 18
5  4  9 14 19
6  5 10 15 20

If speed is less important than clarity, then @Simon's solution works well:

existingDF <- rbind(existingDF[1:r,],newrow,existingDF[-(1:r),])
> existingDF
   V1 V2 V3 V4
1   1  6 11 16
2   2  7 12 17
3   3  8 13 18
4   1  2  3  4
41  4  9 14 19
5   5 10 15 20

(Note we index r differently).

And finally, benchmarks:

library(microbenchmark)
microbenchmark(
  rbind(existingDF[1:r,],newrow,existingDF[-(1:r),]),
  insertRow(existingDF,newrow,r)
)

Unit: microseconds
                                                    expr     min       lq   median       uq       max
1                       insertRow(existingDF, newrow, r) 660.131 678.3675 695.5515 725.2775   928.299
2 rbind(existingDF[1:r, ], newrow, existingDF[-(1:r), ]) 801.161 831.7730 854.6320 881.6560 10641.417

Benchmarks

As @MatthewDowle always points out to me, benchmarks need to be examined for the scaling as the size of the problem increases. Here we go then:

benchmarkInsertionSolutions <- function(nrow=5,ncol=4) {
  existingDF <- as.data.frame(matrix(seq(nrow*ncol),nrow=nrow,ncol=ncol))
  r <- 3 # Row to insert into
  newrow <- seq(ncol)
  m <- microbenchmark(
   rbind(existingDF[1:r,],newrow,existingDF[-(1:r),]),
   insertRow(existingDF,newrow,r),
   insertRow2(existingDF,newrow,r)
  )
  # Now return the median times
  mediansBy <- by(m$time,m$expr, FUN=median)
  res <- as.numeric(mediansBy)
  names(res) <- names(mediansBy)
  res
}
nrows <- 5*10^(0:5)
benchmarks <- sapply(nrows,benchmarkInsertionSolutions)
colnames(benchmarks) <- as.character(nrows)
ggplot( melt(benchmarks), aes(x=Var2,y=value,colour=Var1) ) + geom_line() + scale_x_log10() + scale_y_log10()

@Roland's solution scales quite well, even with the call to rbind:

                                                              5       50     500    5000    50000     5e+05
insertRow2(existingDF, newrow, r)                      549861.5 579579.0  789452 2512926 46994560 414790214
insertRow(existingDF, newrow, r)                       895401.0 905318.5 1168201 2603926 39765358 392904851
rbind(existingDF[1:r, ], newrow, existingDF[-(1:r), ]) 787218.0 814979.0 1263886 5591880 63351247 829650894

Plotted on a linear scale:

linear

And a log-log scale:

log-log

insertRow2 <- function(existingDF, newrow, r) {
  existingDF <- rbind(existingDF,newrow)
  existingDF <- existingDF[order(c(1:(nrow(existingDF)-1),r-0.5)),]
  row.names(existingDF) <- 1:nrow(existingDF)
  return(existingDF)  
}

insertRow2(existingDF,newrow,r)

  V1 V2 V3 V4
1  1  6 11 16
2  2  7 12 17
3  1  2  3  4
4  3  8 13 18
5  4  9 14 19
6  5 10 15 20

microbenchmark(
+   rbind(existingDF[1:r,],newrow,existingDF[-(1:r),]),
+   insertRow(existingDF,newrow,r),
+   insertRow2(existingDF,newrow,r)
+ )
Unit: microseconds
                                                    expr     min       lq   median       uq      max
1                       insertRow(existingDF, newrow, r) 513.157 525.6730 531.8715 544.4575 1409.553
2                      insertRow2(existingDF, newrow, r) 430.664 443.9010 450.0570 461.3415  499.988
3 rbind(existingDF[1:r, ], newrow, existingDF[-(1:r), ]) 606.822 625.2485 633.3710 653.1500 1489.216

You should try dplyr package

library(dplyr)
a <- data.frame(A = c(1, 2, 3, 4),
               B = c(11, 12, 13, 14))


system.time({
for (i in 50:1000) {
    b <- data.frame(A = i, B = i * i)
    a <- bind_rows(a, b)
}

})

Output

   user  system elapsed 
   0.25    0.00    0.25

In contrast with using rbind function

a <- data.frame(A = c(1, 2, 3, 4),
                B = c(11, 12, 13, 14))


system.time({
    for (i in 50:1000) {
        b <- data.frame(A = i, B = i * i)
        a <- rbind(a, b)
    }

})

Output

   user  system elapsed 
   0.49    0.00    0.49

There is some performance gain.

The .before argument in dplyr::add_row can be used to specify the row.

dplyr::add_row(
  cars,
  speed = 0,
  dist = 0,
  .before = 3
)
#>    speed dist
#> 1      4    2
#> 2      4   10
#> 3      0    0
#> 4      7    4
#> 5      7   22
#> 6      8   16
#> ...

Related questions
                            
                                Add a common Legend for combined ggplots
                            
                                ggplot2 plot without axes, legends, etc
                            
                                Better explanation of when to use Imports/Depends
                            
                                Fastest way to replace NAs in a large data.table
                            
                                Global variables in R
                            
                                Insert picture/table in R Markdown [closed]
                            
                                How to generate a number of most distinctive colors in R?
                            
                                Check for installed packages before running install.packages() [duplicate]
                            
                                Is there a way to make R beep/play a sound at the end of a script?
                            
                                promise already under evaluation: recursive default argument reference or earlier problems?
                            
                                Count number of occurences for each unique value
                            
                                Determining memory usage of objects?
                            
                                What does %>% function mean in R?
                            
                                Returning multiple objects in an R function [duplicate]
                            
                                Summarizing multiple columns with dplyr? [duplicate]
                            
                                How to interpret dplyr message `summarise()` regrouping output by 'x' (override with `.groups` argument)?
                            
                                Select rows of a matrix that meet a condition
                            
                                Group by multiple columns in dplyr, using string vector input
                            
                                Remove plot axis values
                            
                                How to set size for local image using knitr for markdown?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Add new row to dataframe, at specific row-index, not appended?

Tags:

dataframe

r

insert

People also ask

Recent Activity

Donate For Us