Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pyspark Groupby with aggregation Round value to 2 decimals

df = spark.createDataFrame([
    ("2017-Dec-08 00:00 - 2017-Dec-09 00:00", 80.65,"abc"), 
    ("2017-Dec-09 00:00 - 2017-Dec-10 00:00", 100,"abc"),
    ("2017-Dec-08 00:00 - 2017-Dec-09 00:00", 65,"def"), 
    ("2017-Dec-09 00:00 - 2017-Dec-10 00:00", 78.02,"def")
]).toDF("date", "percent","device")

I need to apply groupby on with avg

schema = StructType([
    StructField('date', StringType(), True),
    StructField('percent', FloatType(), True),
    StructField('device', StringType(), True)
]) 
dtaDF.groupBy("device").agg(round(mean("percent").alias("y"),2))

I'm facing the below exception

TypeError: a float is required
like image 934
Krishna Avatar asked Sep 02 '25 06:09

Krishna


1 Answers

>>> df = sqlContext.createDataFrame([
...     ("2017-Dec-08 00:00 - 2017-Dec-09 00:00", 80.65,"abc"), 
...     ("2017-Dec-09 00:00 - 2017-Dec-10 00:00", 100.00,"abc"),
...     ("2017-Dec-08 00:00 - 2017-Dec-09 00:00", 65.00,"def"), 
...     ("2017-Dec-09 00:00 - 2017-Dec-10 00:00", 78.02,"def")
... ]).toDF("date", "percent","device")
>>> schema = StructType([
...     StructField('date', StringType(), True),
...     StructField('percent', FloatType(), True),
...     StructField('device', StringType(), True)
... ]) 

>>> df.groupBy("device").agg(round(mean("percent"),2).alias("y")).show()
+------+--------+         
|device|       y|
+------+--------+
|   def|   71.51|
|   abc|   90.33|
+------+--------+
like image 151
Bala Avatar answered Sep 05 '25 01:09

Bala