I am looking for a way to select columns of my dataframe in PySpark. For the first row, I know I can use <code>df.first()</code>, but not sure about columns given that they do not have column names. I have 5 columns and want to loop through each one of them. <pre class="prettyprint"><code>+--+---+---+---+---+---+---+ |_1| _2| _3| _4| _5| _6| _7| +--+---+---+---+---+---+---+ |1 |0.0|0.0|0.0|1.0|0.0|0.0| |2 |1.0|0.0|0.0|0.0|0.0|0.0| |3 |0.0|0.0|1.0|0.0|0.0|0.0| </code></pre>

Try something like this: <pre class="prettyprint"><code>df.select([c for c in df.columns if c in ['_2','_4','_5']]).show() </code></pre>

First two columns and 5 rows <pre class="prettyprint"><code> df.select(df.columns[:2]).take(5) </code></pre>

You can use an array and unpack it inside the select: <pre class="prettyprint"><code>cols = ['_2','_4','_5'] df.select(*cols).show() </code></pre>

Select columns in PySpark dataframe

Tags:

python

apache-spark

apache-spark-sql

pyspark

I am looking for a way to select columns of my dataframe in PySpark. For the first row, I know I can use df.first(), but not sure about columns given that they do not have column names.

I have 5 columns and want to loop through each one of them.

+--+---+---+---+---+---+---+
|_1| _2| _3| _4| _5| _6| _7|
+--+---+---+---+---+---+---+
|1 |0.0|0.0|0.0|1.0|0.0|0.0|
|2 |1.0|0.0|0.0|0.0|0.0|0.0|
|3 |0.0|0.0|1.0|0.0|0.0|0.0|

443

asked Oct 18 '17 14:10

Nivi

3 Answers

Try something like this:

df.select([c for c in df.columns if c in ['_2','_4','_5']]).show()

112

answered Sep 30 '22 13:09

MaxU - stop WAR against UA

First two columns and 5 rows

 df.select(df.columns[:2]).take(5)

answered Sep 30 '22 12:09

Michael West

You can use an array and unpack it inside the select:

cols = ['_2','_4','_5']
df.select(*cols).show()

answered Sep 30 '22 11:09

Shadowtrooper

Related questions
                            
                                Python list slice syntax used for no obvious reason
                            
                                Iterating through a multidimensional array in Python
                            
                                Eclipse Organize Imports Shortcut (Ctrl+Shift+O) is not working
                            
                                Fastest way to zero out low values in array?
                            
                                Django: how to get format date in views?
                            
                                Django : Table doesn't exist
                            
                                Pandas Series of lists to one series
                            
                                Python min function with a list of objects
                            
                                averaging list of lists python column-wise
                            
                                Pandas: Get duplicated indexes
                            
                                pip is showing error 'lsb_release -a' returned non-zero exit status 1
                            
                                How do I merge two lists into a single list?
                            
                                How to change the keys of a dictionary?
                            
                                How to launch getattr function in python with additional parameters?
                            
                                Regular expression to extract URL from an HTML link
                            
                                Django - How to get admin url from model instance
                            
                                Env. Variables not set while running Minimal Flask application
                            
                                Python regex for integer?
                            
                                Escaping strings for use in XML
                            
                                How to check if a given number is a power of two?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With