In this question the data.table
package creator explains why rows cannot be inserted (or removed) by reference in the middle a data.table
yet. He also points out that such operations could be possible at end of the table. Could you show a code to perfome this action? It would be the "by reference" version of
a<- data.table(id=letters[1:2], var=1:2) > a id var 1: a 1 2: b 2 > rbind(a, data.table(id="c", var=3)) id var 1: a 1 2: b 2 3: c 3
thanks.
EDIT:
since a proper solution is not possible yet, which of the following is better (if internally different, not sure) either from a speed and memory usage perpective?
rbind(a, data.table(id="c", var=3)) rbindlist(list(a, data.table(id="c", var=3)))
are there eventually other (better) methods?
With command rbindlist from the data. table package, we can append dt_add_row and new_row row-wise. Object dt_add_row, shown in Table 2, shows the original data. table with the added row number 6.
To add a new row, declare a new variable as type DataRow. A new DataRow object is returned when you call the NewRow method. The DataTable then creates the DataRow object based on the structure of the table, as defined by the DataColumnCollection.
How To Find Common/Uncommon Rows Between Two Datatables – In UiPath. Datatables in programming are commonly used to store the collection of data in rows and columns We might sometimes need to find the common/uncommon items between the two DataTables. Let us see how to implement it!!! Common Values dt_CommonRows = dt1.
To answer your edit, just run a benchmark:
a = data.table(id=letters[1:2], var=1:2) b = copy(a) c = copy(b) # let's also just try modifying same value in place # to see how well changing existing values does microbenchmark(a <- rbind(a, data.table(id="c", var=3)), b <- rbindlist(list(b, data.table(id="c", var=3))), c[1, var := 3L], set(c, 1L, 2L, 3L)) #Unit: microseconds # expr min lq median uq max neval # a <- rbind(a, data.table(id = "c", var = 3)) 865.460 1141.2585 1357.1230 1539.4300 6814.492 100 #b <- rbindlist(list(b, data.table(id = "c", var = 3))) 260.440 325.3835 445.4190 522.8825 1143.930 100 # c[1, `:=`(var, 3L)] 482.147 626.5570 778.3135 904.3595 1109.539 100 # set(c, 1L, 2L, 3L) 2.339 5.677 7.5140 9.5170 19.033 100
rbindlist
is clearly better than rbind
. Thanks to Matthew Dowle pointing out the problems with using [
in a loop, I added another benchmark with set
.
From the above your best options are using rbindlist
, or sizing the data.table
to begin with and then just populating the values (you can also use a similar strategy to std::vector
in C++
, and double the size every time you run out of space, if you don't know the size of the data to begin with, and then once you're done filling it in, delete the extra rows).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With