Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: Warning when creating a (long) list of dummies

Tags:

r

data.table

A dummy column for a column c and a given value x equals 1 if c==x and 0 else. Usually, by creating dummies for a column c, one excludes one value x at choice, as the last dummy column doesn't add any information w.r.t. the already existing dummy columns.

Here's how I'm trying to create a long list of dummies for a column firm, in a data.table:

values <- unique(myDataTable$firm)
cols <- paste('d',as.character(inds[-1]), sep='_') # gives us nice d_value names for columns
# the [-1]: I arbitrarily do not create a dummy for the first unique value
myDataTable[, (cols):=lapply(values[-1],function(x)firm==x)]

This code reliably worked for previous columns, which had smaller unique values. firm however is larger:

tr(values)
 num [1:3082] 51560090 51570615 51603870 51604677 51606085 ...

I get a warning when trying to add the columns:

Warning message:
  truelength (6198) is greater than 1000 items over-allocated (length = 36). See ?truelength. If you didn't set the datatable.alloccol option very large, please report this to datatable-help including the result of sessionInfo().

As far as I can tell, there is still all columns that I need. Can I just ignore this issue? Will it slow down future computations? I'm not sure what to make of this and the relevant of truelength.

like image 274
FooBar Avatar asked Sep 28 '22 07:09

FooBar


1 Answers

Taking Arun's comment as an answer.
You should use alloc.col function to pre-allocate required amount of columns in your data.table to the number which will be bigger than expected ncol.

alloc.col(myDataTable, 3200)

Additionally depending on the way how you consume the data I would recommend to consider reshaping your wide table to long table, see EAV. Then you need to have only one column per data type.

like image 179
jangorecki Avatar answered Oct 07 '22 19:10

jangorecki