Inconsistent results using ALS in Apache Spark

Question

I'm very new to Apache Spark and big data in general. I'm using the ALS method to create rating predictions based on a matrix of users, items, and ratings. The confusing part is that when I run the script to calculate the predictions, the results are different every time, without the input or the requested predictions changing. Is this expected behavior, or should the results be identical? Below is the Python code for reference.

from pyspark import SparkContext
from pyspark.mllib.recommendation import ALS

sc = SparkContext("local", "CF")

# get ratings from text
def parseRating(line):
  fields = line.split(',')
  return (int(fields[0]), int(fields[1]), float(fields[2]))

# define input and output files
ratingsFile = 's3n://weburito/data/weburito_ratings.dat'
unratedFile = 's3n://weburito/data/weburito_unrated.dat'
predictionsFile = '/root/weburito/data/weburito_predictions.dat'

# read training set
training = sc.textFile(ratingsFile).map(parseRating).cache()

# get unknown ratings set
predictions = sc.textFile(unratedFile).map(parseRating)

# define model
model = ALS.train(training, rank = 5, iterations = 20)

# generate predictions
predictions = model.predictAll(predictions.map(lambda x: (x[0], x[1]))).collect()

Nick Pentreath · Accepted Answer

This is expected behaviour. The factor matrices in ALS are initialized randomly (well actually one of them is, and the other is solved based on that initialization in the first step).

So different runs will give slightly different results.

Inconsistent results using ALS in Apache Spark

Tags:

python

apache-spark

pyspark

bigdata

Ricky Vesel

1 Answers

Nick Pentreath

Recent Activity

Donate For Us

Inconsistent results using ALS in Apache Spark

Tags:

python

apache-spark

pyspark

bigdata

Ricky Vesel

1 Answers

Nick Pentreath

Related questions

Recent Activity

Donate For Us