Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to resolve Pyspark dataframes query error keyword can't be an expression

I am having two dataframes named tweetsDF and HashtagsDF. tweet_status_id of both the dataframes are equal and I want to retrieve hashtags count for a single tweet. This is the query I am using which inturns throws

ERROR : SyntaxError: keyword can't be an expression

tweet_hashtags_count_DF = tweetsDF.join(HashtagsDF,sum('tweetsDF.*'),tweetsDF.tweet_status_id == HashtagsDF.tweet_status_id & tweetsDF.tweet_status_id='636984052600274944').show()

Where I am wrong in the query ?

like image 1000
Jayasree Avatar asked Dec 18 '22 03:12

Jayasree


1 Answers

Try tweetsDF.tweet_status_id == '636984052600274944' (== instead of =).

like image 191
andrew_reece Avatar answered Dec 27 '22 02:12

andrew_reece