for
averageCount = (wordCountsDF
.groupBy().mean()).head()
I get
Row(avg(count)=1.6666666666666667)
but when I try:
averageCount = (wordCountsDF
.groupBy().mean()).head().getFloat(0)
I get the following error:
AttributeError: getFloat --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) in () 1 # TODO: Replace with appropriate code ----> 2 averageCount = (wordCountsDF 3 .groupBy().mean()).head().getFloat(0) 4 5 print averageCount
/databricks/spark/python/pyspark/sql/types.py in getattr(self, item) 1270 raise AttributeError(item) 1271
except ValueError: -> 1272 raise AttributeError(item) 1273 1274 def setattr(self, key, value):AttributeError: getFloat
What am I doing wrong?
PySpark Collect() – Retrieve data from DataFrame. Collect() is the function, operation for RDD or Dataframe that is used to retrieve the data from the Dataframe. It is used useful in retrieving all the elements of the row from each partition in an RDD and brings that over the driver node/program.
To create a new Row, use RowFactory. create() in Java or Row. apply() in Scala. A Row object can be constructed by providing field values.
I figured it out. This will return me the value:
averageCount = (wordCountsDF
.groupBy().mean()).head()[0]
This also works:
averageCount = (wordCountsDF
.groupBy().mean('count').collect())[0][0]
print averageCount
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With