Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get column name which contains a specific value at any rows in python pandas

I want to get column name from the whole database (assume the database contains more than 100 rows with more than 50 column) based on specific value that contain in a specific column in pandas.

with the help of Bkmm3 (member from india) I've succeeded on numerical term but failed on alphabetic term. the way I've tried is this:

df = pd.DataFrame({'A':['APPLE','BALL','CAT'],
                    'B':['ACTION','BATMAN','CATCHUP'],
                    'C':['ADVERTISE','BEAST','CARTOON']})
response = input("input")
for i in df.columns: if(len(df.query(i + '==' + str(response))) > 0):
print(i)`

then output arise as error:

Traceback (most recent call last): NameError: name 'APPLE' is not defined

Any Help from You Guys will be very Appreciated, Thank You . . .

like image 205
wh112 Avatar asked Jun 19 '18 08:06

wh112


2 Answers

isin/eq works for DataFrames, and you can 100% vectorize this:

df.columns[df.isin(['APPLE']).any()]  # df.isin([response])

Or,

df.columns[df.eq(response).any()]

Index(['A'], dtype='object')

And here's the roundabout way with DataFrame.eval and np.logical_or (were you to loop on columns):

df.columns[
    np.logical_or.reduce(
        [df.eval(f"{repr(response)} in {i}") for i in df]
)]
Index(['A'], dtype='object')
like image 112
cs95 Avatar answered Sep 27 '22 15:09

cs95


First, the reason for your error. With pd.DataFrame.query, as with regular comparisons, you need to surround strings with quotation marks. So this would work (notice the pair of " quotations):

response = input("input")

for i in df.columns:
    if not df.query(i + '=="' + str(response) + '"').empty:
        print(i)

inputAPPLE
A

Next, you can extract index and/or columns via pd.DataFrame.any. coldspeed's solution is fine here, I'm just going to show how similar syntax can be used to extract both row and column labels.

# columns
print(df.columns[(df == response).any(1)])
Index(['A'], dtype='object')

# rows
print(df.index[(df == response).any(0)])
Int64Index([0], dtype='int64')

Notice in both cases you get as your result Index objects. The code differs only in the property being extracted and in the axis parameter of pd.DataFrame.any.

like image 29
jpp Avatar answered Sep 27 '22 16:09

jpp