Apache Spark ALS recommendations approach

Question

Trying to build recommendation system using Spark MLLib's ALS.

Currently, we're trying to pre-build recommendations for all users on daily basis. We're using simple implicit feedbacks and ALS.

The problem is, we have 20M users and 30M products, and to call the main predict() method, we need to have the cartesian join for users and products, which is too huge, and it may take days to generate only the join. Is there a way to avoid cartesian join to make the process faster?

Currently we have 8 nodes with 64Gb of RAM, I think it should be enough for the data.

val users: RDD[Int] = ???           // RDD with 20M userIds
val products: RDD[Int] = ???        // RDD with 30M productIds
val ratings : RDD[Rating] = ???     // RDD with all user->product feedbacks

val model = new ALS().setRank(10).setIterations(10)
  .setLambda(0.0001).setImplicitPrefs(true)
  .setAlpha(40).run(ratings)

val usersProducts = users.cartesian(products)
val recommendations = model.predict(usersProducts)

stholzm · Accepted Answer

Not sure if you really need the whole 20M x 30M matrix. In case you just want to pre-build recommendations for products per user, maybe try recommendProducts(user: Int, num: Int) for all users, limiting yourself to the num strongest recommendations. There is also recommendUsers().

Apache Spark ALS recommendations approach

Tags:

machine-learning

apache-spark

bigdata

apache-spark-mllib

recommendation-engine

Aram Mkrtchyan

1 Answers

stholzm

Recent Activity

Donate For Us

Apache Spark ALS recommendations approach

Tags:

machine-learning

apache-spark

bigdata

apache-spark-mllib

recommendation-engine

Aram Mkrtchyan

1 Answers

stholzm

Related questions

Recent Activity

Donate For Us