Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Subset of columns and filter Pandas

Tags:

python

pandas

Using Pandas how would I filter rows and take just a subset of columns from a pandas dataframe please in one command.

I am trying to apply something like this....

frame[(frame.DESIGN_VALUE > 20) & (frame['mycol3','mycol6']))]

Thanks.

like image 429
Simon Avatar asked Oct 02 '15 13:10

Simon


People also ask

How to filter by multiple column values in pandas Dataframe?

Filter by multiple column values using relational operators DataFrame is a data structure used to store the data in two dimensional format. It is similar to table that stores the data in rows and columns. Rows represents the records/ tuples and columns refers to the attributes. We can create the DataFrame by using pandas.DataFrame () method.

How to create a Dataframe from a row in pandas?

Rows represents the records/ tuples and columns refers to the attributes. We can create the DataFrame by using pandas.DataFrame () method. We can also create a DataFrame using dictionary by skipping columns and indices.

How to filter Dataframe by single column value in R?

Here we are going to filter dataframe by single column value by using loc [] function. This function will take column name as input and filter the data using relational operators. column refers the dataframe column name where value is filtered in this column value is the string/numeric data compared with actual column value in the dataframe

What do the square brackets mean in a pandas Dataframe?

The inner square brackets define a Python list with column names, whereas the outer brackets are used to select the data from a pandas DataFrame as seen in the previous example. The returned data type is a pandas DataFrame:


1 Answers

You can use the boolean condition to generate a mask and pass a list of cols of interest using loc:

frame.loc[frame['DESIGN_VALUE'] > 20,['mycol3', 'mycol6']]

I advise the above because it means you operate on a view not a copy, secondly I also strongly suggest using [] to select your columns rather than as attributes via sot . operator, this avoids ambiguities in pandas behaviour

Example:

In [184]:
df = pd.DataFrame(columns = list('abc'), data = np.random.randn(5,3))
df

Out[184]:
          a         b         c
0 -0.628354  0.833663  0.658212
1  0.032443  1.062135 -0.335318
2 -0.450620 -0.906486  0.015565
3  0.280459 -0.375468 -1.603993
4  0.463750 -0.638107 -1.598261

In [187]:
df.loc[df['a']>0, ['b','c']]

Out[187]:
          b         c
1  1.062135 -0.335318
3 -0.375468 -1.603993
4 -0.638107 -1.598261

This:

frame[(frame.DESIGN_VALUE > 20) & (frame['mycol3','mycol6'])]

Won't work as you're trying to sub-select from your df as a condition by including it using &

like image 178
EdChum Avatar answered Oct 11 '22 13:10

EdChum