I would like to print all rows of a dataframe where I find the value '-' in any of the columns. Can someone please explain a way that is better than those described below?
This Q&A already explains how to do so by using boolean indexing but each column needs to be declared separately:
print df.ix[df['A'].isin(['-']) | df['B'].isin(['-']) | df['C'].isin(['-'])]
I tried the following but I get an error 'Cannot index with multidimensional key':
df.ix[df[df.columns.values].isin(['-'])]
So I used this code but I'm not happy with the separate printing for each column tested because it is harder to work with and can print the same row more than once:
import pandas as pd
d = {'A': [1,2,3], 'B': [4,'-',6], 'C': [7,8,'-']}
df = pd.DataFrame(d)
for i in range(len(d.keys())):
temp = df.ix[df.iloc[:,i].isin(['-'])]
if temp.shape[0] > 0:
print temp
Output looks like this:
A B C
1 2 - 8
[1 rows x 3 columns]
A B C
2 3 6 -
[1 rows x 3 columns]
Thanks for your advice.
You can check if a column contains/exists a particular value (string/int), list of multiple values in pandas DataFrame by using pd. series() , in operator, pandas. series. isin() , str.
In the Pandas DataFrame we can find the specified row value with the using function iloc(). In this function we pass the row number as parameter.
Alternatively, you could do something like df[df.isin(["-"]).any(axis=1)]
, e.g.
>>> df = pd.DataFrame({'A': [1,2,3], 'B': ['-','-',6], 'C': [7,8,9]})
>>> df.isin(["-"]).any(axis=1)
0 True
1 True
2 False
dtype: bool
>>> df[df.isin(["-"]).any(axis=1)]
A B C
0 1 - 7
1 2 - 8
(Note I changed the frame a bit so I wouldn't get the axes wrong.)
you can do:
>>> idx = df.apply(lambda ts: any(ts == '-'), axis=1)
>>> df[idx]
A B C
1 2 - 8
2 3 6 -
or
lambda ts: '-' in ts.values
note that in
looks into the index not the values, so you need .values
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With