When a new column is added to a data.table
that is loaded from disk, it get copied.
library('data.table')
dt <- data.table(a=1,b=2)
save.image("test.RData")
load("test.RData")
dt
$ a b
$1: 1 2
class(dt)
$[1] "data.table" "data.frame"
address(dt)
$[1] "00000000046F1F38"
dt[, b := NULL]
address(dt)
$[1] "00000000046F1F38"
dt[, c := 2]
address(dt)
$[1] "000000000D815618"
Is this a bug or am I doing something wrong? I am using 1.9.6
of the data.table package.
Your Row or Column input cell is incorrect When you set up the data table it is important to make sure that you correctly assign the correct cell to the Row input cell and Column input cell. If you mix these two around, or click on the wrong cells, you will either get the same result or else nonsensical results.
You create DataColumn objects within a table by using the DataColumn constructor, or by calling the Add method of the Columns property of the table, which is a DataColumnCollection. The Add method accepts optional ColumnName, DataType, and Expression arguments and creates a new DataColumn as a member of the collection.
To expand Excel table column width automatically; you need to perform the following steps: Hit on your Excel table and then go to the Layout. Now from the Cell Size group tap to the format tab. At last hit the AutoFit Column Width.
The leftmost column should be reserved for your independent variable. For example, if you're researching how much rain fell in the past year, your independent variable would be the months of the year. Thus, your leftmost column would be labeled "Month" and the next column would be labeled "Rainfall."
data.table avoids copies when adding columns by over-allocating pointer slots for the list of column vectors when the data.table is created. When you load the data.table like this, over-allocation has not happend and is done once you add a column. This makes a copy necessary.
library('data.table')
dt <- data.table(a=1,b=2)
save.image("test.RData")
load("test.RData")
truelength(dt)
#[1] 0
dt[, b := NULL]
truelength(dt)
#[1] 0
dt[, c := 2]
truelength(dt)
#[1] 101
To quote help("truelength")
:
For tables loaded from disk however, truelength is 0 in R 2.14.0 and random in R <= 2.13.2; i.e., in both cases perhaps unexpected. data.table detects this state and over-allocates the loaded data.table when the next column addition or deletion occurs. All other operations on data.table (such as fast grouping and joins) do not need truelength.
It seems that the documentation is slightly out of date since the copy doesn't happen during deletion of a column.
Note that a copy also happens if you add more columns than have been over-allocated during "normal" creation of a data.table.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With