I'm trying to use MlLib for my colloborative filtering.
I encounter the following error in my Scala program when I run it in Apache Spark 1.0.0.
14/07/15 16:16:31 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/07/15 16:16:31 WARN LoadSnappy: Snappy native library not loaded
14/07/15 16:16:31 INFO FileInputFormat: Total input paths to process : 1
14/07/15 16:16:38 WARN TaskSetManager: Lost TID 10 (task 80.0:0)
14/07/15 16:16:38 WARN TaskSetManager: Loss was due to java.lang.UnsatisfiedLinkError
java.lang.UnsatisfiedLinkError: org.jblas.NativeBlas.dposv(CII[DII[DII)I
at org.jblas.NativeBlas.dposv(Native Method)
at org.jblas.SimpleBlas.posv(SimpleBlas.java:369)
at org.jblas.Solve.solvePositive(Solve.java:68)
at org.apache.spark.mllib.recommendation.ALS$$anonfun$org$apache$spark$mllib$recommendation$ALS$$updateBlock$2.apply(ALS.scala:522)
at org.apache.spark.mllib.recommendation.ALS$$anonfun$org$apache$spark$mllib$recommendation$ALS$$updateBlock$2.apply(ALS.scala:509)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofInt.foreach(ArrayOps.scala:156)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.mutable.ArrayOps$ofInt.map(ArrayOps.scala:156)
at org.apache.spark.mllib.recommendation.ALS.org$apache$spark$mllib$recommendation$ALS$$updateBlock(ALS.scala:509)
at org.apache.spark.mllib.recommendation.ALS$$anonfun$org$apache$spark$mllib$recommendation$ALS$$updateFeatures$2.apply(ALS.scala:445)
at org.apache.spark.mllib.recommendation.ALS$$anonfun$org$apache$spark$mllib$recommendation$ALS$$updateFeatures$2.apply(ALS.scala:444)
at org.apache.spark.rdd.MappedValuesRDD$$anonfun$compute$1.apply(MappedValuesRDD.scala:31)
at org.apache.spark.rdd.MappedValuesRDD$$anonfun$compute$1.apply(MappedValuesRDD.scala:31)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$4.apply(CoGroupedRDD.scala:156)
at org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$4.apply(CoGroupedRDD.scala:154)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.rdd.CoGroupedRDD.compute(CoGroupedRDD.scala:154)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at org.apache.spark.rdd.MappedValuesRDD.compute(MappedValuesRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at org.apache.spark.rdd.FlatMappedValuesRDD.compute(FlatMappedValuesRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at org.apache.spark.rdd.FlatMappedRDD.compute(FlatMappedRDD.scala:33)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.Task.run(Task.scala:51)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
14/07/15 16:16:38 ERROR TaskSchedulerImpl: Lost executor 0 on maroki.office.mkechinov.ru: Uncaught exception
14/07/15 16:16:38 WARN TaskSetManager: Lost TID 12 (task 80.0:0)
14/07/15 16:16:42 WARN TaskSetManager: Lost TID 18 (task 80.0:1)
14/07/15 16:16:42 WARN TaskSetManager: Loss was due to fetch failure from null
14/07/15 16:16:42 WARN TaskSetManager: Loss was due to fetch failure from null
14/07/15 16:16:43 WARN TaskSetManager: Lost TID 25 (task 80.1:0)
14/07/15 16:16:43 WARN TaskSetManager: Loss was due to java.lang.UnsatisfiedLinkError
How can I solve this error?
ALS is implemented in Apache Spark ML and built for a larges-scale collaborative filtering problems. ALS is doing a pretty good job at solving scalability and sparseness of the Ratings data, and it's simple and scales well to very large datasets.
regParam specifies the regularization parameter in ALS (defaults to 1.0). implicitPrefs specifies whether to use the explicit feedback ALS variant or one adapted for implicit feedback data (defaults to false which means using explicit feedback).
rank is the number of features to use (also referred to as the number of latent factors). iterations is the number of iterations of ALS to run. ALS typically converges to a reasonable solution in 20 iterations or less. lambda specifies the regularization parameter in ALS.
To address some of the limitations of content-based filtering, collaborative filtering uses similarities between users and items simultaneously to provide recommendations.
Spark documentation clearly mentions that MLLib uses native libraries, which need to be present on the nodes. (that is it does not come with spark installation)
MLlib uses the
jblas
linear algebra library, which itself depends on native Fortran routines. You may need to install thegfortran
runtime library if it is not already present on your nodes. MLlib will throw a linking error if it cannot detect these libraries automatically.
You have to make sure that libgfortran library exists on all nodes.
for debian/ubuntu use:
sudo apt-get install libgfortran3
for centos use:
sudo yum install gcc-gfortran
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With