Hi I am new to scala and spark. I am trying group by through spark sql. When I am trying to save or to view the output.It throws following error.
value coalesce is not a member of org.apache.spark.sql.RelationalGroupedDataset
This is my code.
val fp = filtertable.select($"_1", $"_2", $"_3",$"_4").groupBy("_1", "_2","_3")
fp.show() // throws error
fp.coalesce(1).write.format("csv").save("file://" + test.toString()) //throws error.
Any help will be appreciated.
Group DataFrame using a mapper or by a Series of columns. A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups.
Convert Multiple Series to Pandas DataFrame Now you'll observe how to convert multiple Series (for the following data) into a DataFrame. In order to convert the 3 Series into a DataFrame, you'll need to: Convert the 3 Series into 3 DataFrames. Concatenate the 3 DataFrames into a single DataFrame.
DataFrame - apply() function. The apply() function is used to apply a function along an axis of the DataFrame. Objects passed to the function are Series objects whose index is either the DataFrame's index (axis=0) or the DataFrame's columns (axis=1).
The question suggests that you want to write the grouped data in a text file in a csv format. If my analysis is correct, then groupBy
on rdd
should be the solution you desire as groupBy
on a dataframe
would need aggregation
to be followed. So you will have to convert the dataframe
to rdd
, apply groupBy
and finally write the output to the csv
file as
val fp = df.select($"_1", $"_2", $"_3",$"_4")
.rdd
.groupBy(row => (row(0), row(1), row(2))) // similar to groupBy("_1", "_2","_3") on dataframe
.flatMap(kv => kv._2) // taking the grouped data
.map(_.mkString(",")) // making data in csv format
fp.coalesce(1).saveAsTextFile("file://" + test.toString())
I hope the answer is helpful
If you only want to return the grouped items, then you can just select the first item of an ungrouped column and then to select on the grouped columns like so:
val fp = filtertable
.select($"_1", $"_2", $"_3", $"_4")
.groupBy($"_1", $"_2", $"_3")
.agg(first($"_4"))
.select($"_1", $"_2", $"_3")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With