for <pre class="prettyprint"><code>averageCount = (wordCountsDF .groupBy().mean()).head() </code></pre> I get <blockquote> Row(avg(count)=1.6666666666666667) </blockquote> but when I try: <pre class="prettyprint"><code>averageCount = (wordCountsDF .groupBy().mean()).head().getFloat(0) </code></pre> I get the following error: <blockquote> AttributeError: getFloat --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) in () 1 # TODO: Replace with appropriate code ----> 2 averageCount = (wordCountsDF 3 .groupBy().mean()).head().getFloat(0) 4 5 print averageCount /databricks/spark/python/pyspark/sql/types.py in getattr(self, item) 1270 raise AttributeError(item) 1271 except ValueError: -> 1272 raise AttributeError(item) 1273 1274 def setattr(self, key, value): AttributeError: getFloat </blockquote> What am I doing wrong?

I figured it out. This will return me the value: <pre class="prettyprint"><code>averageCount = (wordCountsDF .groupBy().mean()).head()[0] </code></pre>

This also works: <pre class="prettyprint"><code>averageCount = (wordCountsDF .groupBy().mean('count').collect())[0][0] print averageCount </code></pre>

How to get a value from the Row object in Spark Dataframe?

Tags:

apache-spark

pyspark

spark-dataframe

for

averageCount = (wordCountsDF
                .groupBy().mean()).head()

I get

Row(avg(count)=1.6666666666666667)

but when I try:

averageCount = (wordCountsDF
                .groupBy().mean()).head().getFloat(0)

I get the following error:

AttributeError: getFloat --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) in () 1 # TODO: Replace with appropriate code ----> 2 averageCount = (wordCountsDF 3 .groupBy().mean()).head().getFloat(0) 4 5 print averageCount

/databricks/spark/python/pyspark/sql/types.py in getattr(self, item) 1270 raise AttributeError(item) 1271
except ValueError: -> 1272 raise AttributeError(item) 1273 1274 def setattr(self, key, value):

AttributeError: getFloat

What am I doing wrong?

978

asked Jun 23 '16 18:06

saptak

2 Answers

I figured it out. This will return me the value:

averageCount = (wordCountsDF
                .groupBy().mean()).head()[0]

108

answered Oct 03 '22 02:10

saptak

This also works:

averageCount = (wordCountsDF
                .groupBy().mean('count').collect())[0][0]
print averageCount

answered Oct 03 '22 04:10

Veronica Wenqian Cheng

Related questions
                            
                                How to read a nested collection in Spark
                            
                                Initialize an RDD to empty
                            
                                Spark Build Custom Column Function, user defined function
                            
                                Why do we need to add "fork in run := true" when running Spark SBT application?
                            
                                filter spark dataframe with row field that is an array of strings
                            
                                Spark Data Frame Random Splitting
                            
                                Save a large Spark Dataframe as a single json file in S3
                            
                                Exception while deleting Spark temp dir in Windows 7 64 bit
                            
                                PySpark - get row number for each row in a group
                            
                                How to pass environment variables to spark driver in cluster mode with spark-submit
                            
                                Apply a function to a single column of a csv in Spark
                            
                                Pyspark - converting json string to DataFrame
                            
                                Partitioning a large skewed dataset in S3 with Spark's partitionBy method
                            
                                error: not found: value StructType/StructField/StringType
                            
                                How to calculate the best numberOfPartitions for coalesce?
                            
                                NoClassDefFoundError: org/apache/hadoop/fs/StreamCapabilities while reading s3 Data with spark
                            
                                Spark - How to run a standalone cluster locally
                            
                                How to calculate mean and standard deviation given a PySpark DataFrame?
                            
                                Comparison operator in PySpark (not equal/ !=)
                            
                                Recursively fetch file contents from subdirectories using sc.textFile

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to get a value from the Row object in Spark Dataframe?

Tags:

apache-spark

pyspark

spark-dataframe

saptak

People also ask

2 Answers

saptak

Veronica Wenqian Cheng

Recent Activity

Donate For Us