Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dummy Variables in Julia

In R there is nice functionality for running a regression with dummy variables for each level of a categorical variable. e.g. Automatically expanding an R factor into a collection of 1/0 indicator variables for every factor level

Is there an equivalent way to do this in Julia.

x = randn(1000)
group = repmat(1:25 , 40)
groupMeans = randn(25)
y = 3*x + groupMeans[group]

data = DataFrame(x=x, y=y, g=group)
for i in levels(group)
    data[parse("I$i")] = data[:g] .== i
end
lm(y~x+I1+I2+I3+I4+I5+I6+I7+I8+I9+I10+
    I11+I12+I13+I14+I15+I16+I17+I18+I19+I20+
    I21+I22+I23+I24, data)
like image 554
Rob Donnelly Avatar asked Mar 20 '15 02:03

Rob Donnelly


1 Answers

If you are using the DataFrames package, after you pool the data, the package will take care of the rest:

Pooling columns is important for working with the GLM package When fitting regression models, PooledDataArray columns in the input are translated into 0/1 indicator columns in the ModelMatrix - with one column for each of the levels of the PooledDataArray.

You can see the rest of documentation on pooled data here

like image 149
ntdef Avatar answered Oct 18 '22 16:10

ntdef