Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

using a variable name for a column name in pandas

Tags:

pandas

filter

i have searched for days without finding an answer... i am trying to cut down a copy of a dataframe based on a variable column name being tested conditionally.

here is the code:

import pandas as pd

d = {'a':[1, 2, 3, 22, None], 
     'b':[4, None, 6, None, 33],
     'c':[7, 8, None, None, None],
     'd':[10, 110, 12, 250, 35],
     'e':[None, None, None, 26, None],
     'f':[16, None, 20, 39, 62],
     'g':[19, 20, 21, None, None]}

df = pd.DataFrame(d)
print(df)
print('\n')

df2 = pd.DataFrame()
df2 ['count'] = df.count()
df2 = df2.sort_values(by='count', ascending = False)
print(df2)
print('\n')

first_var = df2.index[0]
print(first_var)
print('\n')

df3 = pd.DataFrame() 
df3 = df.copy()

# this line gives the entire df, not d values under 100
df3[df3[first_var] < 100]

# this line crashes
# df3[df.first_var < 100]

print(df3)

this is the output:

    a   b   c    d   e   f   g
0   1   4   7   10 NaN  16  19
1   2 NaN   8  110 NaN NaN  20
2   3   6 NaN   12 NaN  20  21
3  22 NaN NaN  250  26  39 NaN 
4 NaN  33 NaN   35 NaN  62 NaN


count
d      5
a      4
f      4
b      3
g      3
c      2
e      1


d


    a   b   c    d   e   f   g
0   1   4   7   10 NaN  16  19
1   2 NaN   8  110 NaN NaN  20
2   3   6 NaN   12 NaN  20  21
3  22 NaN NaN  250  26  39 NaN
4 NaN  33 NaN   35 NaN  62 NaN

*****************************************

what i am really looking for is this output:

    a   b   c    d   e   f   g
0   1   4   7   10 NaN  16  19
2   3   6 NaN   12 NaN  20  21
4 NaN  33 NaN   35 NaN  62 NaN

any help is greatly appreciated. thanks

like image 821
bud fox Avatar asked Jan 20 '26 19:01

bud fox


1 Answers

Try this:

  df[~(df > 100).any(axis=1)]

    a     b    c   d   e     f     g
0  1.0   4.0  7.0  10 NaN  16.0  19.0
2  3.0   6.0  NaN  12 NaN  20.0  21.0
4  NaN  33.0  NaN  35 NaN  62.0   NaN

The "~" returns the opposite of True/False condition.

If your data looked like this instead:

d = {'a':[1, 2, 3, 22, 130], 
     'b':[4, None, 6, None, 33],
     'c':[7, 8, None, None, None],
     'd':[10, 110, 12, 250, 35],
     'e':[None, None, None, 26, None],
     'f':[16, None, 20, 12, 62],
     'g':[19, 20, 21, None, None]}

df = pd.DataFrame(d)
df

     a     b    c    d     e     f     g
0    1   4.0  7.0   10   NaN  16.0  19.0
1    2   NaN  8.0  110   NaN   NaN  20.0
2    3   6.0  NaN   12   NaN  20.0  21.0
3   22   NaN  NaN  250  26.0  12.0   NaN
4  130  33.0  NaN   35   NaN  62.0   NaN
    ^ added this 

Use something like this: 

df[~(df["d"] > 100)]

     a     b    c   d   e     f     g
0    1   4.0  7.0  10 NaN  16.0  19.0
2    3   6.0  NaN  12 NaN  20.0  21.0
4  130  33.0  NaN  35 NaN  62.0   NaN
like image 121
Merlin Avatar answered Jan 27 '26 00:01

Merlin



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!