Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mixed Effects Models in Spark or other technology

Is it possible to run a mixed-effects regression model in Spark? (as we can do with lme4 in R, with MixedModels in Julia or with Statsmodels MixedLM in Python).
Any example would be great.

I've read there is a GLMix function but I don't know if the user can use it directly to fit a model and get the coefficients and p-values or if it can only be used internally by machine learning libraries.

I would like to move to Spark because my datasets are much bigger than memory.

Is there any other common database or framework able to do something like that streaming data from disk?
I've only seen some able to do simple linear regression.

Regards

like image 353
skan Avatar asked Sep 30 '16 11:09

skan


People also ask

What are mixed effects models used for?

A Mixed Effects Model is a statistical test used to predict a single variable using two or more other variables. It also is used to determine the numerical relationship between one variable and others. The variable you want to predict should be continuous and your data should meet the other assumptions listed below.

What are mixed effects regression models?

We focus here on mixed-model (or mixed-effects) regression analysis,21 which means that the model posited to describe the data contains both fixed effects and random effects. Fixed effects are those aspects of the model that (are assumed to) describe systematic features in the data.

What is the difference between Gee and mixed model?

Mixed effect modeling allows both fixed (aka marginal) and random effects, while GEE modeling allows for fixed effects alone. A fixed effect is akin to a population effect: some measured variable is believed to have a single effect across the population.

Are mixed models machine learning?

Machine learning algorithms can use mixed models to conceptualize data in a way that allows for understanding the effects of phenomena both between groups, and within them.


Video Answer


1 Answers

Yes, this is definitely possible with Spark.

The first thing I would look into is a rather popular library called ML Lib. I am not sure if it does exactly the kind of model that you need, but definitely more than 'simple linear regression'.

Another library 'linkedin/photon-ml', which I am not familiar with, does explictly mention mixed effect models.

Here is an example of using the Generalized Additive Mixed Effects driver:

spark-submit \
  --class com.linkedin.photon.ml.cli.game.GameTrainingDriver \
  --master local[*] \
  --num-executors 4 \
  --driver-memory 1G \
  --executor-memory 1G \
  "./build/photon-all_2.10/libs/photon-all_2.10-1.0.0.jar" \
  --input-data-directories "./a1a/train/" \
  --validation-data-directories "./a1a/test/" \
  --root-output-directory "out" \
  --feature-shard-configurations "name=globalShard,feature.bags=features" \
  --coordinate-configurations "name=global,feature.shard=globalShard,min.partitions=4,optimizer=LBFGS,tolerance=1.0E-6,max.iter=50,regularization=L2,reg.weights=0.1|1|10|100" \
  --coordinate-update-sequence "global" \
  --coordinate-descent-iterations 1 \
  --training-task "LOGISTIC_REGRESSION"
like image 177
Dennis Jaheruddin Avatar answered Sep 30 '22 21:09

Dennis Jaheruddin