Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert the group by function to data frame

Hi I am new to scala and spark. I am trying group by through spark sql. When I am trying to save or to view the output.It throws following error.

value coalesce is not a member of org.apache.spark.sql.RelationalGroupedDataset

This is my code.

 val fp = filtertable.select($"_1", $"_2", $"_3",$"_4").groupBy("_1", "_2","_3")
 fp.show() // throws error
 fp.coalesce(1).write.format("csv").save("file://" + test.toString()) //throws error.

Any help will be appreciated.

like image 908
Rakshita Avatar asked Jul 18 '17 10:07

Rakshita


People also ask

What is group by in DataFrame?

Group DataFrame using a mapper or by a Series of columns. A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups.

How do you convert to data frame in pandas?

Convert Multiple Series to Pandas DataFrame Now you'll observe how to convert multiple Series (for the following data) into a DataFrame. In order to convert the 3 Series into a DataFrame, you'll need to: Convert the 3 Series into 3 DataFrames. Concatenate the 3 DataFrames into a single DataFrame.

How do I apply a function to an entire data frame?

DataFrame - apply() function. The apply() function is used to apply a function along an axis of the DataFrame. Objects passed to the function are Series objects whose index is either the DataFrame's index (axis=0) or the DataFrame's columns (axis=1).


2 Answers

The question suggests that you want to write the grouped data in a text file in a csv format. If my analysis is correct, then groupBy on rdd should be the solution you desire as groupBy on a dataframe would need aggregation to be followed. So you will have to convert the dataframe to rdd, apply groupBy and finally write the output to the csv file as

val fp = df.select($"_1", $"_2", $"_3",$"_4")
      .rdd
      .groupBy(row => (row(0), row(1), row(2)))  // similar to groupBy("_1", "_2","_3") on dataframe
      .flatMap(kv => kv._2)   // taking the grouped data
      .map(_.mkString(","))   // making data in csv format

    fp.coalesce(1).saveAsTextFile("file://" + test.toString())

I hope the answer is helpful

like image 98
Ramesh Maharjan Avatar answered Oct 19 '22 23:10

Ramesh Maharjan


If you only want to return the grouped items, then you can just select the first item of an ungrouped column and then to select on the grouped columns like so:

 val fp = filtertable
     .select($"_1", $"_2", $"_3", $"_4")
     .groupBy($"_1", $"_2", $"_3")
     .agg(first($"_4"))
     .select($"_1", $"_2", $"_3")
like image 1
wllmtrng Avatar answered Oct 20 '22 00:10

wllmtrng