Can one use the actions <code>collect</code> or <code>take</code> to print only a given column of DataFrame? This <pre class="prettyprint"><code>df.col.collect() </code></pre> gives error <blockquote> TypeError: 'Column' object is not callable </blockquote> and this: <pre class="prettyprint"><code>df[df.col].take(2) </code></pre> gives <blockquote> pyspark.sql.utils.AnalysisException: u"filter expression 'col' of type string is not a boolean.;" </blockquote>

<code>select</code> and <code>show</code>: <pre class="prettyprint"><code>df.select("col").show() </code></pre> or <code>select</code>, <code>flatMap</code>, <code>collect</code>: <pre class="prettyprint"><code>df.select("col").rdd.flatMap(list).collect() </code></pre> Bracket notation (<code>df[df.col]</code>) is used only for logical slicing and columns by itself (<code>df.col</code>) are not distributed data structures but SQL expressions and cannot be collected.

How to print only a certain column of DataFrame in PySpark?

Tags:

Can one use the actions collect or take to print only a given column of DataFrame?

This

df.col.collect()

gives error

TypeError: 'Column' object is not callable

and this:

df[df.col].take(2)

gives

pyspark.sql.utils.AnalysisException: u"filter expression 'col' of type string is not a boolean.;"

226

asked Mar 10 '16 10:03

mar tin

Video Answer

1 Answers

select and show:

df.select("col").show()

or select, flatMap, collect:

df.select("col").rdd.flatMap(list).collect()

Bracket notation (df[df.col]) is used only for logical slicing and columns by itself (df.col) are not distributed data structures but SQL expressions and cannot be collected.

answered Sep 19 '22 17:09

zero323

Related questions
                            
                                Java - ternary operator weird behaviour
                            
                                ES6 classes, with parent in different file, and node.js?
                            
                                UIDocumentPickerViewController returns url to a file that does not exist
                            
                                Python enumerate reverse index only
                            
                                How to aggregate two PostgreSQL columns to an array separated by brackets
                            
                                External API Calls With Express, Node.JS and Require Module
                            
                                For what reason Convolution 1x1 is used in deep neural networks?
                            
                                pandas.factorize on an entire data frame
                            
                                Angular2 i18n for placeholder text
                            
                                laravel 5.4 embed image in mail
                            
                                Is a Firebase UID always 28 characters?
                            
                                Custom legend labels in my rechart chart

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With