I have a text file which is of the following format:
ID,Name,Rating
1,A,3
2,B,4
1,A,4
and I want to find the average rating for each ID in spark. This is the code I have so far but it keeps on giving me an error:
val Avg_data=spark.sql("select ID, AVG(Rating) from table")
ERROR: org.apache.sapk.sql.AnalysisException: grouping expressions sequence is empty, and 'table'.'ID' is not an aggregate function. Wrap '(avg(CAST(table.'Rating' AS BIGINT)) as 'avg(Rating)')' in windowing function(s).........
AVG() is an aggregation function so you would need a group by too
val Avg_data=spark.sql("select ID, AVG(Rating) as average from table group by ID")
You should have Avg_data as
+---+-------+
|ID |average|
+---+-------+
|1 |3.5 |
|2 |4.0 |
+---+-------+
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With