Got a large dataframe that I want to take slices of (according to multiple boolean criteria), and then modify the entries in those slices in order to change the original dataframe -- i.e. I need a <code>view</code> to the original. Problem is, fancy indexing always returns a <code>copy</code>. Thought of the <code>.ix</code> method, but boolean indexing with the <code>df.ix[]</code> method also returns a copy. Essentially if <code>df</code> is my dataframe, I'd like a view to column C such that <code>C!=0, A==10, B<30,...</code> etc. Is there a fast way to do this in pandas?

Even though <code>df.loc[idx]</code> may be a copy of a portion of <code>df</code>, assignment to <code>df.loc[idx]</code> modifies <code>df</code> itself. (This is also true of <code>df.iloc</code> and <code>df.ix</code>.) For example, <pre class="prettyprint"><code>import pandas as pd import numpy as np df = pd.DataFrame({'A':[9,10]*6, 'B':range(23,35), 'C':range(-6,6)}) print(df) # A B C # 0 9 23 -6 # 1 10 24 -5 # 2 9 25 -4 # 3 10 26 -3 # 4 9 27 -2 # 5 10 28 -1 # 6 9 29 0 # 7 10 30 1 # 8 9 31 2 # 9 10 32 3 # 10 9 33 4 # 11 10 34 5 </code></pre> Here is our boolean index: <pre class="prettyprint"><code>idx = (df['C']!=0) & (df['A']==10) & (df['B']<30) </code></pre> We can modify those rows of <code>df</code> where <code>idx</code> is True by assigning to <code>df.loc[idx, ...]</code>. For example, <pre class="prettyprint"><code>df.loc[idx, 'A'] += df.loc[idx, 'B'] * df.loc[idx, 'C'] print(df) </code></pre> yields <pre class="prettyprint"><code> A B C 0 9 23 -6 1 -110 24 -5 2 9 25 -4 3 -68 26 -3 4 9 27 -2 5 -18 28 -1 6 9 29 0 7 10 30 1 8 9 31 2 9 10 32 3 10 9 33 4 11 10 34 5 </code></pre>

boolean indexing that can produce a view to a large pandas dataframe?

Q: Is boolean indexing possible in DataFrame?

Boolean indexing helps us to select the data from the DataFrames using a boolean vector. We need a DataFrame with a boolean index to use the boolean indexing.

Q: Which indexing is possible in pandas DataFrame?

Essentially, there are two main ways of indexing pandas dataframes: label-based and position-based (aka location-based or integer-based). Also, it is possible to apply boolean dataframe indexing based on predefined conditions, or even mix different types of dataframe indexing.

Q: What is boolean indexing pandas?

Pandas boolean indexing is a standard procedure. We will select the subsets of data based on the actual values in the DataFrame and not on their row/column labels or integer locations. Pandas indexing operators “&” and “|” provide easy access to select values from Pandas data structures across various use cases.

Q: What are the two ways of indexing DataFrame?

Indexing is used to access values present in the Dataframe using “loc” and “iloc” functions.

Got a large dataframe that I want to take slices of (according to multiple boolean criteria), and then modify the entries in those slices in order to change the original dataframe -- i.e. I need a view to the original. Problem is, fancy indexing always returns a copy. Thought of the .ix method, but boolean indexing with the df.ix[] method also returns a copy.

Essentially if df is my dataframe, I'd like a view to column C such that C!=0, A==10, B<30,... etc. Is there a fast way to do this in pandas?

Is boolean indexing possible in DataFrame?

Boolean indexing helps us to select the data from the DataFrames using a boolean vector. We need a DataFrame with a boolean index to use the boolean indexing.

Which indexing is possible in pandas DataFrame?

Essentially, there are two main ways of indexing pandas dataframes: label-based and position-based (aka location-based or integer-based). Also, it is possible to apply boolean dataframe indexing based on predefined conditions, or even mix different types of dataframe indexing.

What is boolean indexing pandas?

Pandas boolean indexing is a standard procedure. We will select the subsets of data based on the actual values in the DataFrame and not on their row/column labels or integer locations. Pandas indexing operators “&” and “|” provide easy access to select values from Pandas data structures across various use cases.

What are the two ways of indexing DataFrame?

Indexing is used to access values present in the Dataframe using “loc” and “iloc” functions.

Even though df.loc[idx] may be a copy of a portion of df, assignment to df.loc[idx] modifies df itself. (This is also true of df.iloc and df.ix.)

For example,

import pandas as pd
import numpy as np
df = pd.DataFrame({'A':[9,10]*6,
                   'B':range(23,35),
                   'C':range(-6,6)})

print(df)
#      A   B  C
# 0    9  23 -6
# 1   10  24 -5
# 2    9  25 -4
# 3   10  26 -3
# 4    9  27 -2
# 5   10  28 -1
# 6    9  29  0
# 7   10  30  1
# 8    9  31  2
# 9   10  32  3
# 10   9  33  4
# 11  10  34  5

Here is our boolean index:

idx = (df['C']!=0) & (df['A']==10) & (df['B']<30)

We can modify those rows of df where idx is True by assigning to df.loc[idx, ...]. For example,

df.loc[idx, 'A'] += df.loc[idx, 'B'] * df.loc[idx, 'C']
print(df)

yields

      A   B  C
0     9  23 -6
1  -110  24 -5
2     9  25 -4
3   -68  26 -3
4     9  27 -2
5   -18  28 -1
6     9  29  0
7    10  30  1
8     9  31  2
9    10  32  3
10    9  33  4
11   10  34  5

boolean indexing that can produce a view to a large pandas dataframe?

Tags:

python

pandas

dataframe

optional

People also ask

1 Answers

unutbu

Recent Activity

Donate For Us

boolean indexing that can produce a view to a large pandas dataframe?

Tags:

python

pandas

dataframe

optional

People also ask

1 Answers

unutbu

Related questions

Recent Activity

Donate For Us