Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

boolean indexing that can produce a view to a large pandas dataframe?

Got a large dataframe that I want to take slices of (according to multiple boolean criteria), and then modify the entries in those slices in order to change the original dataframe -- i.e. I need a view to the original. Problem is, fancy indexing always returns a copy. Thought of the .ix method, but boolean indexing with the df.ix[] method also returns a copy.

Essentially if df is my dataframe, I'd like a view to column C such that C!=0, A==10, B<30,... etc. Is there a fast way to do this in pandas?

like image 859
optional Avatar asked Feb 28 '13 19:02

optional


People also ask

Is boolean indexing possible in DataFrame?

Boolean indexing helps us to select the data from the DataFrames using a boolean vector. We need a DataFrame with a boolean index to use the boolean indexing.

Which indexing is possible in pandas DataFrame?

Essentially, there are two main ways of indexing pandas dataframes: label-based and position-based (aka location-based or integer-based). Also, it is possible to apply boolean dataframe indexing based on predefined conditions, or even mix different types of dataframe indexing.

What is boolean indexing pandas?

Pandas boolean indexing is a standard procedure. We will select the subsets of data based on the actual values in the DataFrame and not on their row/column labels or integer locations. Pandas indexing operators “&” and “|” provide easy access to select values from Pandas data structures across various use cases.

What are the two ways of indexing DataFrame?

Indexing is used to access values present in the Dataframe using “loc” and “iloc” functions.


1 Answers

Even though df.loc[idx] may be a copy of a portion of df, assignment to df.loc[idx] modifies df itself. (This is also true of df.iloc and df.ix.)

For example,

import pandas as pd
import numpy as np
df = pd.DataFrame({'A':[9,10]*6,
                   'B':range(23,35),
                   'C':range(-6,6)})

print(df)
#      A   B  C
# 0    9  23 -6
# 1   10  24 -5
# 2    9  25 -4
# 3   10  26 -3
# 4    9  27 -2
# 5   10  28 -1
# 6    9  29  0
# 7   10  30  1
# 8    9  31  2
# 9   10  32  3
# 10   9  33  4
# 11  10  34  5

Here is our boolean index:

idx = (df['C']!=0) & (df['A']==10) & (df['B']<30)

We can modify those rows of df where idx is True by assigning to df.loc[idx, ...]. For example,

df.loc[idx, 'A'] += df.loc[idx, 'B'] * df.loc[idx, 'C']
print(df)

yields

      A   B  C
0     9  23 -6
1  -110  24 -5
2     9  25 -4
3   -68  26 -3
4     9  27 -2
5   -18  28 -1
6     9  29  0
7    10  30  1
8     9  31  2
9    10  32  3
10    9  33  4
11   10  34  5
like image 142
unutbu Avatar answered Sep 20 '22 07:09

unutbu