Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apache Spark -- MlLib -- Collaborative filtering

I'm trying to use MlLib for my colloborative filtering.

I encounter the following error in my Scala program when I run it in Apache Spark 1.0.0.

   14/07/15 16:16:31 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    14/07/15 16:16:31 WARN LoadSnappy: Snappy native library not loaded
    14/07/15 16:16:31 INFO FileInputFormat: Total input paths to process : 1
    14/07/15 16:16:38 WARN TaskSetManager: Lost TID 10 (task 80.0:0)
    14/07/15 16:16:38 WARN TaskSetManager: Loss was due to java.lang.UnsatisfiedLinkError
    java.lang.UnsatisfiedLinkError: org.jblas.NativeBlas.dposv(CII[DII[DII)I
        at org.jblas.NativeBlas.dposv(Native Method)
        at org.jblas.SimpleBlas.posv(SimpleBlas.java:369)
        at org.jblas.Solve.solvePositive(Solve.java:68)
        at org.apache.spark.mllib.recommendation.ALS$$anonfun$org$apache$spark$mllib$recommendation$ALS$$updateBlock$2.apply(ALS.scala:522)
        at org.apache.spark.mllib.recommendation.ALS$$anonfun$org$apache$spark$mllib$recommendation$ALS$$updateBlock$2.apply(ALS.scala:509)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
        at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
        at scala.collection.mutable.ArrayOps$ofInt.foreach(ArrayOps.scala:156)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
        at scala.collection.mutable.ArrayOps$ofInt.map(ArrayOps.scala:156)
        at org.apache.spark.mllib.recommendation.ALS.org$apache$spark$mllib$recommendation$ALS$$updateBlock(ALS.scala:509)
        at org.apache.spark.mllib.recommendation.ALS$$anonfun$org$apache$spark$mllib$recommendation$ALS$$updateFeatures$2.apply(ALS.scala:445)
        at org.apache.spark.mllib.recommendation.ALS$$anonfun$org$apache$spark$mllib$recommendation$ALS$$updateFeatures$2.apply(ALS.scala:444)
        at org.apache.spark.rdd.MappedValuesRDD$$anonfun$compute$1.apply(MappedValuesRDD.scala:31)
        at org.apache.spark.rdd.MappedValuesRDD$$anonfun$compute$1.apply(MappedValuesRDD.scala:31)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
        at org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$4.apply(CoGroupedRDD.scala:156)
        at org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$4.apply(CoGroupedRDD.scala:154)
        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
        at org.apache.spark.rdd.CoGroupedRDD.compute(CoGroupedRDD.scala:154)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
        at org.apache.spark.rdd.MappedValuesRDD.compute(MappedValuesRDD.scala:31)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
        at org.apache.spark.rdd.FlatMappedValuesRDD.compute(FlatMappedValuesRDD.scala:31)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
        at org.apache.spark.rdd.FlatMappedRDD.compute(FlatMappedRDD.scala:33)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
        at org.apache.spark.scheduler.Task.run(Task.scala:51)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)
    14/07/15 16:16:38 ERROR TaskSchedulerImpl: Lost executor 0 on maroki.office.mkechinov.ru: Uncaught exception
    14/07/15 16:16:38 WARN TaskSetManager: Lost TID 12 (task 80.0:0)
    14/07/15 16:16:42 WARN TaskSetManager: Lost TID 18 (task 80.0:1)
    14/07/15 16:16:42 WARN TaskSetManager: Loss was due to fetch failure from null
    14/07/15 16:16:42 WARN TaskSetManager: Loss was due to fetch failure from null
    14/07/15 16:16:43 WARN TaskSetManager: Lost TID 25 (task 80.1:0)
    14/07/15 16:16:43 WARN TaskSetManager: Loss was due to java.lang.UnsatisfiedLinkError

How can I solve this error?

like image 734
Goko Gorgiovski Avatar asked Jul 15 '14 12:07

Goko Gorgiovski


People also ask

Is ALS collaborative filtering?

ALS is implemented in Apache Spark ML and built for a larges-scale collaborative filtering problems. ALS is doing a pretty good job at solving scalability and sparseness of the Ratings data, and it's simple and scales well to very large datasets.

What is regParam in ALS?

regParam specifies the regularization parameter in ALS (defaults to 1.0). implicitPrefs specifies whether to use the explicit feedback ALS variant or one adapted for implicit feedback data (defaults to false which means using explicit feedback).

What is rank in ALS?

rank is the number of features to use (also referred to as the number of latent factors). iterations is the number of iterations of ALS to run. ALS typically converges to a reasonable solution in 20 iterations or less. lambda specifies the regularization parameter in ALS.

What is meant by collaborative filtering in recommender system?

To address some of the limitations of content-based filtering, collaborative filtering uses similarities between users and items simultaneously to provide recommendations.


1 Answers

Spark documentation clearly mentions that MLLib uses native libraries, which need to be present on the nodes. (that is it does not come with spark installation)

MLlib uses the jblas linear algebra library, which itself depends on native Fortran routines. You may need to install the gfortran runtime library if it is not already present on your nodes. MLlib will throw a linking error if it cannot detect these libraries automatically.

You have to make sure that libgfortran library exists on all nodes.

for debian/ubuntu use: sudo apt-get install libgfortran3

for centos use: sudo yum install gcc-gfortran

like image 151
Shyamendra Solanki Avatar answered Oct 18 '22 09:10

Shyamendra Solanki