Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Add a row by reference at the end of a data.table object

Tags:

r

data.table

In this question the data.table package creator explains why rows cannot be inserted (or removed) by reference in the middle a data.table yet. He also points out that such operations could be possible at end of the table. Could you show a code to perfome this action? It would be the "by reference" version of

a<- data.table(id=letters[1:2], var=1:2) > a    id var 1:  a   1 2:  b   2 > rbind(a, data.table(id="c", var=3))    id var 1:  a   1 2:  b   2 3:  c   3 

thanks.

EDIT:

since a proper solution is not possible yet, which of the following is better (if internally different, not sure) either from a speed and memory usage perpective?

rbind(a, data.table(id="c", var=3))  rbindlist(list(a,  data.table(id="c", var=3))) 

are there eventually other (better) methods?

like image 578
Michele Avatar asked May 28 '13 12:05

Michele


People also ask

How do I add a row to a DataTable in R?

With command rbindlist from the data. table package, we can append dt_add_row and new_row row-wise. Object dt_add_row, shown in Table 2, shows the original data. table with the added row number 6.

How do I add a row to a DataTable?

To add a new row, declare a new variable as type DataRow. A new DataRow object is returned when you call the NewRow method. The DataTable then creates the DataRow object based on the structure of the table, as defined by the DataColumnCollection.

What is the use of add data row in UiPath?

How To Find Common/Uncommon Rows Between Two Datatables – In UiPath. Datatables in programming are commonly used to store the collection of data in rows and columns We might sometimes need to find the common/uncommon items between the two DataTables. Let us see how to implement it!!! Common Values dt_CommonRows = dt1.


1 Answers

To answer your edit, just run a benchmark:

a = data.table(id=letters[1:2], var=1:2) b = copy(a) c = copy(b) # let's also just try modifying same value in place             # to see how well changing existing values does microbenchmark(a <- rbind(a, data.table(id="c", var=3)),                b <- rbindlist(list(b,  data.table(id="c", var=3))),                c[1, var := 3L],                set(c, 1L, 2L, 3L)) #Unit: microseconds #                                                  expr     min        lq    median        uq      max neval #          a <- rbind(a, data.table(id = "c", var = 3)) 865.460 1141.2585 1357.1230 1539.4300 6814.492   100 #b <- rbindlist(list(b, data.table(id = "c", var = 3))) 260.440  325.3835  445.4190  522.8825 1143.930   100 #                                   c[1, `:=`(var, 3L)] 482.147  626.5570  778.3135  904.3595 1109.539   100 #                                    set(c, 1L, 2L, 3L)   2.339    5.677    7.5140    9.5170   19.033   100 

rbindlist is clearly better than rbind. Thanks to Matthew Dowle pointing out the problems with using [ in a loop, I added another benchmark with set.

From the above your best options are using rbindlist, or sizing the data.table to begin with and then just populating the values (you can also use a similar strategy to std::vector in C++, and double the size every time you run out of space, if you don't know the size of the data to begin with, and then once you're done filling it in, delete the extra rows).

like image 58
eddi Avatar answered Sep 29 '22 10:09

eddi