Using Pandas how would I filter rows and take just a subset of columns from a pandas dataframe please in one command.
I am trying to apply something like this....
frame[(frame.DESIGN_VALUE > 20) & (frame['mycol3','mycol6']))]
Thanks.
Filter by multiple column values using relational operators DataFrame is a data structure used to store the data in two dimensional format. It is similar to table that stores the data in rows and columns. Rows represents the records/ tuples and columns refers to the attributes. We can create the DataFrame by using pandas.DataFrame () method.
Rows represents the records/ tuples and columns refers to the attributes. We can create the DataFrame by using pandas.DataFrame () method. We can also create a DataFrame using dictionary by skipping columns and indices.
Here we are going to filter dataframe by single column value by using loc [] function. This function will take column name as input and filter the data using relational operators. column refers the dataframe column name where value is filtered in this column value is the string/numeric data compared with actual column value in the dataframe
The inner square brackets define a Python list with column names, whereas the outer brackets are used to select the data from a pandas DataFrame as seen in the previous example. The returned data type is a pandas DataFrame:
You can use the boolean condition to generate a mask and pass a list of cols of interest using loc
:
frame.loc[frame['DESIGN_VALUE'] > 20,['mycol3', 'mycol6']]
I advise the above because it means you operate on a view not a copy, secondly I also strongly suggest using []
to select your columns rather than as attributes via sot .
operator, this avoids ambiguities in pandas behaviour
Example:
In [184]:
df = pd.DataFrame(columns = list('abc'), data = np.random.randn(5,3))
df
Out[184]:
a b c
0 -0.628354 0.833663 0.658212
1 0.032443 1.062135 -0.335318
2 -0.450620 -0.906486 0.015565
3 0.280459 -0.375468 -1.603993
4 0.463750 -0.638107 -1.598261
In [187]:
df.loc[df['a']>0, ['b','c']]
Out[187]:
b c
1 1.062135 -0.335318
3 -0.375468 -1.603993
4 -0.638107 -1.598261
This:
frame[(frame.DESIGN_VALUE > 20) & (frame['mycol3','mycol6'])]
Won't work as you're trying to sub-select from your df as a condition by including it using &
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With