First I want to create an empty datatable with column names but it fails:
data <- data.table(va, vb, vc)
> Error in data.table(va, vb, vc) : object 'va' not found
Second I want to append datatable to it but it fails too :
data2 <- data.table(va=c(-1,0,1), vb=c(-1,0,1), vc=c(-1,0,1))
data2
va vb vc
1: -1 -1 -1
2: 0 0 0
3: 1 1 1
merge(data2,data2)
> Error in merge.data.table(data2, data2) :
Can not match keys in x and y to automatically determine appropriate `by` parameter. Please set `by` value explicitly.
Apparently the function can't identify the by
parameters with two identical datatables. Any idea?
Create an empty data frame in RPass the empty vectors to the data. frame() function, and it will return the empty data frame. To create a vector, use the c() function or named vectors.
To add a new row, declare a new variable as type DataRow. A new DataRow object is returned when you call the NewRow method. The DataTable then creates the DataRow object based on the structure of the table, as defined by the DataColumnCollection.
The maximum number of rows that a DataTable can store is 16,777,216.
To create an empty data.table
use (assuming all columns are numeric):
library(data.table)
data <- data.table(va=numeric(), vb=numeric(), vc=numeric())
data
which results in:
> data
Empty data.table (0 rows) of 3 cols: va,vb,vc
To do a self join over all columns use (even though the result is the same ;-):
data2 <- data.table(va=c(-1,0,1), vb=c(-1,0,1), vc=c(-1,0,1))
data2
merge(data2, data2,by=names(data2))
The reason why you have to specify the by
parameter is the documented semantics of merge
:
by:
A vector of shared column names in x and y to merge on. This defaults to the shared key columns between the two tables. If y has no key columns, this defaults to the key of x.
Since you don't have set any keys the "join" columns to merge the data tables are unclear.
There is no implicit "use all column" semantics if you omit the by
parameter (as cited above the shared key columns are taken).
To append all rows of a data.table to another one you use rbind
("row bind") instead of merge
:
data3 <- rbind(data2, data2)
data3
Which results in:
> data3
va vb vc
1: -1 -1 -1
2: 0 0 0
3: 1 1 1
4: -1 -1 -1
5: 0 0 0
6: 1 1 1
To create an empty data.table, you can start from an empty matrix:
library(data.table)
data <- setNames(data.table(matrix(nrow = 0, ncol = 3)), c("va", "vb", "vc"))
data
Empty data.table (0 rows) of 3 cols: va,vb,vc
Then you can use rbindlist
to append new data.table to it:
data2=data.table(va=c(-1,0,1), vb=c(-1,0,1), vc=c(-1,0,1))
data2
va vb vc
1: -1 -1 -1
2: 0 0 0
3: 1 1 1
rbindlist(list(data, data2))
va vb vc
1: -1 -1 -1
2: 0 0 0
3: 1 1 1
Or even simpler, the following also works:
data <- data.table()
data <- rbindlist(list(data, data2))
data
va vb vc
1: -1 -1 -1
2: 0 0 0
3: 1 1 1
Another way to create an empty data.table with defined column names but without having to define data types:
data <- data.table(1)[,`:=`(c("va", "vb", "vc"),NA)][,V1:=NULL][.0]
This does the following
data.table(1)
: Create a non-NULL
data.table to which you can add columns
V1
with one row. Value 1
NULL
) in the place of 1
[,`:=`(c("va", "vb", "vc"),NA)]
: Add columns va
, vb
, vc
V1
) and one row. value 1,NA,NA,NA
NULL
value can be substituted for NA[,V1:=NULL]
: Remove the V1
column[.0]
: Return a blank row
If you don't like the black magic of [.0]
you can also use
data <- data.table(1)[,`:=`(c("va", "vb", "vc"),NA)][,V1:=NULL][!is.na(va)]
Edit several years later:
note that these columns are initially class
ed as logical
(for the NA
example as above). The column classes are normally coerced into the classes of the columns of any appended data, but this appears to fail with Date data.
> alldata[,lapply(.SD,class)] # 0-row data seeded with NA in each column as above
va vb vc vd
1: logical logical logical logical
> filedata[,lapply(.SD,class)] # lines of real data that you are trying to merge
va vb vc vd
1: character character integer Date
> rbindlist(list(alldata,filedata))
Error in rbindlist(list(alldata, filedata), use.names = FALSE) :
Class attribute on column 4 of item 2 does not match with column 4 of item 1.
To navigate around this error, one solution is to use @R Yoda's answer with that column declared as e.g. vd=as.Date(character(0), origin = "1970-01-01")
Note that this error was reported to the data.table github repo here for this specific use-case. It had generally been reported here previously.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With