df = spark.createDataFrame([
("2017-Dec-08 00:00 - 2017-Dec-09 00:00", 80.65,"abc"),
("2017-Dec-09 00:00 - 2017-Dec-10 00:00", 100,"abc"),
("2017-Dec-08 00:00 - 2017-Dec-09 00:00", 65,"def"),
("2017-Dec-09 00:00 - 2017-Dec-10 00:00", 78.02,"def")
]).toDF("date", "percent","device")
I need to apply groupby on with avg
schema = StructType([
StructField('date', StringType(), True),
StructField('percent', FloatType(), True),
StructField('device', StringType(), True)
])
dtaDF.groupBy("device").agg(round(mean("percent").alias("y"),2))
I'm facing the below exception
TypeError: a float is required
>>> df = sqlContext.createDataFrame([
... ("2017-Dec-08 00:00 - 2017-Dec-09 00:00", 80.65,"abc"),
... ("2017-Dec-09 00:00 - 2017-Dec-10 00:00", 100.00,"abc"),
... ("2017-Dec-08 00:00 - 2017-Dec-09 00:00", 65.00,"def"),
... ("2017-Dec-09 00:00 - 2017-Dec-10 00:00", 78.02,"def")
... ]).toDF("date", "percent","device")
>>> schema = StructType([
... StructField('date', StringType(), True),
... StructField('percent', FloatType(), True),
... StructField('device', StringType(), True)
... ])
>>> df.groupBy("device").agg(round(mean("percent"),2).alias("y")).show()
+------+--------+
|device| y|
+------+--------+
| def| 71.51|
| abc| 90.33|
+------+--------+
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With