In R there is nice functionality for running a regression with dummy variables for each level of a categorical variable. e.g. Automatically expanding an R factor into a collection of 1/0 indicator variables for every factor level
Is there an equivalent way to do this in Julia.
x = randn(1000)
group = repmat(1:25 , 40)
groupMeans = randn(25)
y = 3*x + groupMeans[group]
data = DataFrame(x=x, y=y, g=group)
for i in levels(group)
data[parse("I$i")] = data[:g] .== i
end
lm(y~x+I1+I2+I3+I4+I5+I6+I7+I8+I9+I10+
I11+I12+I13+I14+I15+I16+I17+I18+I19+I20+
I21+I22+I23+I24, data)
If you are using the DataFrames package, after you pool
the data, the package will take care of the rest:
Pooling columns is important for working with the GLM package When fitting regression models, PooledDataArray columns in the input are translated into 0/1 indicator columns in the ModelMatrix - with one column for each of the levels of the PooledDataArray.
You can see the rest of documentation on pooled data here
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With