Creating dummy variables in R data.table

Tags:

I am working with an extremely large dataset in R and have been operating with data frames and have decided to switch to data.tables to help speed up with operations. I am having trouble understanding the J operations, in particular I'm trying to generate dummy variables but I can't figure out how to code conditional operations within data.tables[].

MWE:

test <- data.table("index"=rep(letters[1:10],100),"var1"=rnorm(1000,0,1))

What I would like to do is to add columns a through j as dummy variables such that column a would have a value 1 when the index == "a" and 0 otherwise. In the data.frame environment it would look something like:

test$a <- 0

test$a[test$index=='a'] <- 1

660

asked Sep 18 '13 19:09

user2792957

1 Answers

This seems to do what you're looking for:

inds <- unique(test$index)
test[, (inds) := lapply(inds, function(x) index == x)]

which gives

      index        var1     a     b     c     d     e     f     g     h     i     j
   1:     a  0.25331851  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
   2:     b -0.02854676 FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
   3:     c -0.04287046 FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
   4:     d  1.36860228 FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE
   5:     e -0.22577099 FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE
  ---                                                                              
 996:     f -1.02040059 FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE
 997:     g -1.31345092 FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE
 998:     h -0.49448088 FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE
 999:     i  1.75175715 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE
1000:     j  0.05576477 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE

Here's another way:

dcast(test, index + var1 ~ index, fun = length)
# or, if you want to preserve row order
dcast(test[, r := .I], r + index + var1 ~ index, fun = length)[, r := NULL]

And another:

rs = split(seq(nrow(test)), test$index)
test[, names(rs) := FALSE ]
for (n in names(rs)) set(test, i = rs[[n]], j = n, v = TRUE )

answered Oct 06 '22 02:10

Frank

Related questions
                            
                                What does the "More Columns than Column Names" error mean?
                            
                                R Markdown HTML Number Figures
                            
                                Variable importance with ranger
                            
                                List to integer or double in R
                            
                                Using column numbers not names in lm()
                            
                                Removing duplicate combinations (irrespective of order)
                            
                                How to webscrape secured pages in R (https links) (using readHTMLTable from XML package)?
                            
                                which.max ties method in R
                            
                                Error in get(as.character(FUN), mode = "function", envir = envir)
                            
                                Duplicate a column in data frame and rename it to another column name
                            
                                Write R data as csv directly to s3
                            
                                mapview for shiny
                            
                                How to compute AUC with ROCR package
                            
                                What is meaning of first tilde in purrr::map
                            
                                Object creation timestamp
                            
                                Any way to produce a LaTeX table from an lme4 mer model fit object?
                            
                                get filename and path of `source`d file
                            
                                Plot percentages on y-axis
                            
                                Pivot on data.table similar to rehape melt function
                            
                                Subscript out of bounds (Caret variable importance for randomForest) [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Creating dummy variables in R data.table

Tags:

r

data.table

dummy-variable

user2792957

People also ask

1 Answers

Frank

Recent Activity

Donate For Us