In Scala I can do <code>get(#)</code> or <code>getAs[Type](#)</code> to get values out of a dataframe. How should I do it in <code>pyspark</code>? I have a two columns DataFrame: <code>item(string)</code> and <code>salesNum(integers)</code>. I do a <code>groupby</code> and <code>mean</code> to get a mean of those numbers like this: <code>saleDF.groupBy("salesNum").mean()).collect()</code> and it works. Now I have the mean in a dataframe with one value. How can I get that value out of the dataframe to get the mean as a float number?

To be precise, <code>collect</code> returns a list whose elements are of type <code>class 'pyspark.sql.types.Row'</code>. In your case to extract the real value you should do: <pre class="prettyprint"><code>saleDF.groupBy("salesNum").mean()).collect()[0]["avg(yourColumnName)"] </code></pre> where <code>yourColumnName</code> is the name of the column you are taking the mean of (pyspark, when applying mean, renames the resulting column in this way by default). As an example, I ran the following code. Look at the types and outputs of each step. <pre class="prettyprint"><code>>>> columns = ['id', 'dogs', 'cats', 'nation'] >>> vals = [ ... (2, 0, 1, 'italy'), ... (1, 2, 0, 'italy'), ... (3, 4, 0, 'france') ... ] >>> df = sqlContext.createDataFrame(vals, columns) >>> df.groupBy("nation").mean("dogs").collect() [Row(nation=u'france', avg(dogs)=4.0), Row(nation=u'italy', avg(dogs)=1.0)] >>> df.groupBy("nation").mean("dogs").collect()[0] Row(nation=u'france', avg(dogs)=4.0)) >>> df.groupBy("nation").mean("dogs").collect()[0]["avg(dogs)"] 4.0 >>> type(df.groupBy("nation").mean("dogs").collect()) <type 'list'> >>> type(df.groupBy("nation").mean("dogs").collect()[0]) <class 'pyspark.sql.types.Row'> >>> type(df.groupBy("nation").mean("dogs").collect()[0]["avg(dogs)"]) <type 'float'> >>> >>> </code></pre>

we can use <code>first()</code> also here. <pre class="prettyprint"><code>saleDF.groupBy("salesNum").mean()).first()[0] </code></pre>

get value out of dataframe

3 Answers

collect() returns your results as a python list. To get the value out of the list you just need to take the first element like this:

saleDF.groupBy("salesNum").mean()).collect()[0]

130

answered Oct 12 '22 20:10

David

To be precise, collect returns a list whose elements are of type class 'pyspark.sql.types.Row'.

In your case to extract the real value you should do:

saleDF.groupBy("salesNum").mean()).collect()[0]["avg(yourColumnName)"]

where yourColumnName is the name of the column you are taking the mean of (pyspark, when applying mean, renames the resulting column in this way by default).

As an example, I ran the following code. Look at the types and outputs of each step.

>>> columns = ['id', 'dogs', 'cats', 'nation']
>>> vals = [
...      (2, 0, 1, 'italy'),
...      (1, 2, 0, 'italy'),
...      (3, 4, 0, 'france')
... ]
>>> df = sqlContext.createDataFrame(vals, columns)
>>> df.groupBy("nation").mean("dogs").collect()
[Row(nation=u'france', avg(dogs)=4.0), Row(nation=u'italy', avg(dogs)=1.0)]
>>> df.groupBy("nation").mean("dogs").collect()[0]
Row(nation=u'france', avg(dogs)=4.0))
>>> df.groupBy("nation").mean("dogs").collect()[0]["avg(dogs)"]
4.0
>>> type(df.groupBy("nation").mean("dogs").collect())
<type 'list'>
>>> type(df.groupBy("nation").mean("dogs").collect()[0])
<class 'pyspark.sql.types.Row'>
>>> type(df.groupBy("nation").mean("dogs").collect()[0]["avg(dogs)"])
<type 'float'>
>>> 
>>>

answered Oct 12 '22 21:10

Francesco Boi

we can use first() also here.

saleDF.groupBy("salesNum").mean()).first()[0]

answered Oct 12 '22 20:10

Jithu

Related questions
                            
                                urllib2 not returning full webpage
                            
                                Python `__init__.py` and initialization of objects in a code
                            
                                Python Recursive Data Reading
                            
                                celery with multiple django instances
                            
                                Flask auto-reload and long-running thread
                            
                                Asynchronous versions of Google APIs?
                            
                                Python SUDS Error - SAXParseException
                            
                                Python PIL remove sections of an image based on its colour
                            
                                Python unittest testing MongoDB randomly fails
                            
                                differentiate mkvirtualenv and mkproject for virturalenvwrapper
                            
                                How to aggregate a boolean field with null values with pandas?
                            
                                scraping headlines from news website with infinite loading
                            
                                python pandas select both head and tail
                            
                                How To Subscribe To Websocket API Channel Using Python?
                            
                                How to make a repeating generator in Python
                            
                                How to pass arguments to main function within Python module?
                            
                                Best practices for persistent database connections in Python when using Flask
                            
                                pytest: How to get a list of all failed tests at the end of the session? (and while using xdist)
                            
                                How To Solve KeyError: u"None of [Index([..], dtype='object')] are in the [columns]"
                            
                                Configure Django and Google Cloud Storage?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

get value out of dataframe

Tags:

python

type-conversion

apache-spark-sql

pyspark

M.Rez

People also ask

3 Answers

David

Francesco Boi

Jithu

Recent Activity

Donate For Us