How can we get top 10 recommended products in PySpark. I understand there are methods like recommendProducts to recommend products for a single user and predictAll to predict rating for the {user,item} pair. But is there a efficient way i can output the top 10 items for each user for all the users?
I wrote this function which multiplies user features and product features by partitions so that it gets distributed then it gets the ratings for each product by user and sorts them by rating and outputs the list of 8 top recommended products.
#Collect product feature matrix
productFeatures = bestModel.productFeatures().collect()
productArray=[]
productFeaturesArray=[]
for x in productFeatures:
productArray.append(x[0])
productFeaturesArray.append(x[1])
matrix=np.matrix(productFeaturesArray)
productArrayBroadCast=sc.broadcast(productArray)
productFeaturesArraybroadcast=sc.broadcast(matrix.T)
def func(iterator):
userFeaturesArray = []
userArray = []
for x in iterator:
userArray.append(x[0])
userFeaturesArray.append(x[1])
userFeatureMatrix = np.matrix(userFeaturesArray)
userRecommendationArray = userFeatureMatrix*(productFeaturesArraybroadcast.value)
mappedUserRecommendationArray = []
#Extract ratings from the matrix
i=0
for i in range(0,len(userArray)):
ratingdict={}
j=0
for j in range(0,len(productArrayBroadcast.value)):
ratingdict[str(productArrayBroadcast.value[j])]=userRecommendationArray.item((i,j))
j=j+1
#Take the top 8 recommendations for the user
sort_apps=sorted(ratingdict.keys(), key=lambda x: x[1])[:8]
sort_apps='|'.join(sort_apps)
mappedUserRecommendationArray.append((userArray[i],sort_apps))
i=i+1
return [x for x in mappedUserRecommendationArray]
recommendations=model.userFeatures().repartition(2000).mapPartitions(func)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With