Im just getting going with Pandas as a tool for munging two dimensional arrays of data. It's super overwhelming, even after reading the docs. You can do so much that I can't figure out how to do anything, if that makes any sense. My dataframe (simplified): <pre class="prettyprint"><code>Date Stock1 Stock2 Stock3 2014.10.10 74.75 NaN NaN 2014.9.9 NaN 100.95 NaN 2010.8.8 NaN NaN 120.45 </code></pre> So each column only has one value. I want to remove all columns that have a max value less than x. So say here as an example, if x = 80, then I want a new DataFrame: <pre class="prettyprint"><code>Date Stock2 Stock3 2014.10.10 NaN NaN 2014.9.9 100.95 NaN 2010.8.8 NaN 120.45 </code></pre> How can this be acheived? I've looked at dataframe.max() which gives me a series. Can I use that, or have a lambda function somehow in select()?

Use the <code>df.max()</code> to index with. <pre class="prettyprint"><code>In [19]: from pandas import DataFrame In [23]: df = DataFrame(np.random.randn(3,3), columns=['a','b','c']) In [36]: df Out[36]: a b c 0 -0.928912 0.220573 1.948065 1 -0.310504 0.847638 -0.541496 2 -0.743000 -1.099226 -1.183567 In [24]: df.max() Out[24]: a -0.310504 b 0.847638 c 1.948065 dtype: float64 </code></pre> Next, we make a boolean expression out of this: <pre class="prettyprint"><code>In [31]: df.max() > 0 Out[31]: a False b True c True dtype: bool </code></pre> Next, you can index df.columns by this (this is called boolean indexing): <pre class="prettyprint"><code>In [34]: df.columns[df.max() > 0] Out[34]: Index([u'b', u'c'], dtype='object') </code></pre> Which you can finally pass to DF: <pre class="prettyprint"><code>In [35]: df[df.columns[df.max() > 0]] Out[35]: b c 0 0.220573 1.948065 1 0.847638 -0.541496 2 -1.099226 -1.183567 </code></pre> Of course, instead of 0, you use any value that you want as the cutoff for dropping.

Python Pandas drop columns based on max value of column

Im just getting going with Pandas as a tool for munging two dimensional arrays of data. It's super overwhelming, even after reading the docs. You can do so much that I can't figure out how to do anything, if that makes any sense.

My dataframe (simplified):

Date       Stock1  Stock2   Stock3
2014.10.10  74.75  NaN     NaN
2014.9.9    NaN    100.95  NaN 
2010.8.8    NaN    NaN     120.45

So each column only has one value.

I want to remove all columns that have a max value less than x. So say here as an example, if x = 80, then I want a new DataFrame:

Date        Stock2   Stock3
2014.10.10   NaN     NaN
2014.9.9     100.95  NaN 
2010.8.8     NaN     120.45

How can this be acheived? I've looked at dataframe.max() which gives me a series. Can I use that, or have a lambda function somehow in select()?

Can you drop columns by index in Pandas?

You can drop columns by index by using DataFrame. drop() method and by using DataFrame. iloc[].

How do you drop rows in Pandas based on multiple column values?

Use drop() method to delete rows based on column value in pandas DataFrame, as part of the data cleansing, you would be required to drop rows from the DataFrame when a column value matches with a static value or on another column value.

How do you drop Pandas rows based on condition?

Use pandas. DataFrame. drop() method to delete/remove rows with condition(s).

Use the df.max() to index with.

In [19]: from pandas import DataFrame

In [23]: df = DataFrame(np.random.randn(3,3), columns=['a','b','c'])

In [36]: df
Out[36]: 
          a         b         c
0 -0.928912  0.220573  1.948065
1 -0.310504  0.847638 -0.541496
2 -0.743000 -1.099226 -1.183567


In [24]: df.max()
Out[24]: 
a   -0.310504
b    0.847638
c    1.948065
dtype: float64

Next, we make a boolean expression out of this:

In [31]: df.max() > 0
Out[31]: 
a    False
b     True
c     True
dtype: bool

Next, you can index df.columns by this (this is called boolean indexing):

In [34]: df.columns[df.max() > 0]
Out[34]: Index([u'b', u'c'], dtype='object')

Which you can finally pass to DF:

In [35]: df[df.columns[df.max() > 0]]
Out[35]: 
          b         c
0  0.220573  1.948065
1  0.847638 -0.541496
2 -1.099226 -1.183567

Of course, instead of 0, you use any value that you want as the cutoff for dropping.

Python Pandas drop columns based on max value of column

Tags:

python

pandas

numpy

professorDante

People also ask

1 Answers

Adam Hughes

Recent Activity

Donate For Us

Python Pandas drop columns based on max value of column

Tags:

python

pandas

numpy

professorDante

People also ask

1 Answers

Adam Hughes

Related questions

Recent Activity

Donate For Us