Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get a value from the Row object in Spark Dataframe?

for

averageCount = (wordCountsDF
                .groupBy().mean()).head()

I get

Row(avg(count)=1.6666666666666667)

but when I try:

averageCount = (wordCountsDF
                .groupBy().mean()).head().getFloat(0)

I get the following error:

AttributeError: getFloat --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) in () 1 # TODO: Replace with appropriate code ----> 2 averageCount = (wordCountsDF 3 .groupBy().mean()).head().getFloat(0) 4 5 print averageCount

/databricks/spark/python/pyspark/sql/types.py in getattr(self, item) 1270 raise AttributeError(item) 1271
except ValueError: -> 1272 raise AttributeError(item) 1273 1274 def setattr(self, key, value):

AttributeError: getFloat

What am I doing wrong?

like image 978
saptak Avatar asked Jun 23 '16 18:06

saptak


People also ask

How do I get data from Spark DataFrame?

PySpark Collect() – Retrieve data from DataFrame. Collect() is the function, operation for RDD or Dataframe that is used to retrieve the data from the Dataframe. It is used useful in retrieving all the elements of the row from each partition in an RDD and brings that over the driver node/program.

How do you use a row in Spark?

To create a new Row, use RowFactory. create() in Java or Row. apply() in Scala. A Row object can be constructed by providing field values.


2 Answers

I figured it out. This will return me the value:

averageCount = (wordCountsDF
                .groupBy().mean()).head()[0]
like image 108
saptak Avatar answered Oct 03 '22 02:10

saptak


This also works:

averageCount = (wordCountsDF
                .groupBy().mean('count').collect())[0][0]
print averageCount
like image 28
Veronica Wenqian Cheng Avatar answered Oct 03 '22 04:10

Veronica Wenqian Cheng