I'm trying to use <code>where</code> on my Pandas DataFrame in replace all cells that don't meet my criteria with <code>NaN</code>. Howevever, I'd like to do it in such a way that will always preserve the shape of my original DataFrame, and not remove any rows from the resulting DataFrame. Given the following DataFrame: <pre class="prettyprint"><code> A B C D 1/1 0 1 0 1 1/2 2 1 1 1 1/3 3 0 1 0 1/4 1 0 1 2 1/5 1 0 1 1 1/6 2 0 2 1 1/7 3 5 2 3 </code></pre> I would like to search the dataframe for all cells that meet a certain criteria, when column <code>D</code> ALSO meets a particular criteria. In this case my criteria is: Find all cells that are greater than the previous value, when column D is also > 1 I accomplish this by using the following syntax: <pre class="prettyprint"><code>matches = df[df > df.shift(1))] matches = matches[df.D > 1] </code></pre> I have to split this query into two statements because of the fact that <code>df.D</code> is a Series and does not match the shape of the entire DataFrame. According to this question I asked previously, support for a broadcasting <code>&</code> operator will not be available until 0.14. The problem I am having is that it seems like after I run the second statement, the shape of the resulting data frame is changed and rows have been removed. The number of columns stays the same. The first statement leaves the original number of rows. Why would the second statement remove rows while the first does not? How could I achieve the same result, but leaving the full number of rows in tact? Edit: The pandas documentation states that in order to guarantee that the shape is preserved, I should use the <code>where</code> method over boolean indexing. However, that does not seem to be allowed to perform my second statement, so: <pre class="prettyprint"><code>matches.where(df.D > 1) </code></pre> Gives me the following error: <blockquote> ValueError: Array conditional must be same shape as self </blockquote>

This is slightly more intuitive than @DSM answer (but pandas missing this type of auto-broadcasting on boolean ops ATM) <pre class="prettyprint"><code>In [58]: df.where((df>df.shift(1)).values & DataFrame(df.D==1).values) Out[58]: A B C D 1/1 NaN NaN NaN NaN 1/2 2 NaN 1 NaN 1/3 NaN NaN NaN NaN 1/4 NaN NaN NaN NaN 1/5 NaN NaN NaN NaN 1/6 2 NaN 2 NaN 1/7 NaN NaN NaN NaN </code></pre> see here for the issue to be addressed in 0.14

Filtering a Pandas DataFrame Without Removing Rows

I'm trying to use where on my Pandas DataFrame in replace all cells that don't meet my criteria with NaN. Howevever, I'd like to do it in such a way that will always preserve the shape of my original DataFrame, and not remove any rows from the resulting DataFrame.

Given the following DataFrame:

      A    B    C    D
1/1   0    1    0    1
1/2   2    1    1    1
1/3   3    0    1    0 
1/4   1    0    1    2
1/5   1    0    1    1
1/6   2    0    2    1
1/7   3    5    2    3

I would like to search the dataframe for all cells that meet a certain criteria, when column D ALSO meets a particular criteria. In this case my criteria is:

Find all cells that are greater than the previous value, when column D is also > 1

I accomplish this by using the following syntax:

matches = df[df > df.shift(1))]
matches = matches[df.D > 1]

I have to split this query into two statements because of the fact that df.D is a Series and does not match the shape of the entire DataFrame. According to this question I asked previously, support for a broadcasting & operator will not be available until 0.14.

The problem I am having is that it seems like after I run the second statement, the shape of the resulting data frame is changed and rows have been removed. The number of columns stays the same. The first statement leaves the original number of rows.

Why would the second statement remove rows while the first does not? How could I achieve the same result, but leaving the full number of rows in tact?

Edit:

The pandas documentation states that in order to guarantee that the shape is preserved, I should use the where method over boolean indexing. However, that does not seem to be allowed to perform my second statement, so:

matches.where(df.D > 1)

Gives me the following error:

ValueError: Array conditional must be same shape as self

How do you filter DataFrame based on rows?

You can use df[df["Courses"] == 'Spark'] to filter rows by a condition in pandas DataFrame. Not that this expression returns a new DataFrame with selected rows. You can also write the above statement with a variable.

Which of the following will filter rows in a Pandas DataFrame?

Often, you want to find instances of a specific value in your DataFrame. You can easily filter rows based on whether they contain a value or not using the . loc indexing method.

How do I filter specific rows from a DataFrame in Python?

To filter rows based on dates, first format the dates in the DataFrame to datetime64 type. Then use the DataFrame. loc[] and DataFrame. query[] function from the Pandas package to specify a filter condition.

This is slightly more intuitive than @DSM answer (but pandas missing this type of auto-broadcasting on boolean ops ATM)

In [58]: df.where((df>df.shift(1)).values & DataFrame(df.D==1).values)
Out[58]: 
      A   B   C   D
1/1 NaN NaN NaN NaN
1/2   2 NaN   1 NaN
1/3 NaN NaN NaN NaN
1/4 NaN NaN NaN NaN
1/5 NaN NaN NaN NaN
1/6   2 NaN   2 NaN
1/7 NaN NaN NaN NaN

see here for the issue to be addressed in 0.14

If I understand what you're after, you can do the broadcasting manually by dropping down to the numpy level:

>>> (df > df.shift(1)).values & (df.D == 1)[:,None]
array([[False, False, False, False],
       [ True, False,  True, False],
       [False, False, False, False],
       [False, False, False, False],
       [False, False, False, False],
       [ True, False,  True, False],
       [False, False, False, False]], dtype=bool)

after which you can use where:

>>> df.where((df > df.shift(1)).values & (df.D == 1)[:,None], np.nan)
      A   B   C   D
1/1 NaN NaN NaN NaN
1/2   2 NaN   1 NaN
1/3 NaN NaN NaN NaN
1/4 NaN NaN NaN NaN
1/5 NaN NaN NaN NaN
1/6   2 NaN   2 NaN
1/7 NaN NaN NaN NaN

Filtering a Pandas DataFrame Without Removing Rows

Tags:

python

pandas

dataframe

numpy

mclark1129

People also ask

2 Answers

Jeff

DSM

Recent Activity

Donate For Us

Filtering a Pandas DataFrame Without Removing Rows

Tags:

python

pandas

dataframe

numpy

mclark1129

People also ask

2 Answers

Jeff

DSM

Related questions

Recent Activity

Donate For Us