Find column name in pandas that matches an array

Tags:

I have a large dataframe (5000 x 12039) and I want to get the column name that matches a numpy array.

For example, if I have the table

        m1lenhr m1lenmin    m1citywt    m1a12a  cm1age  cm1numb m1b1a   m1b1b   m1b12a  m1b12b  ... kind_attention_scale_10 kind_attention_scale_22 kind_attention_scale_21 kind_attention_scale_15 kind_attention_scale_18 kind_attention_scale_19 kind_attention_scale_25 kind_attention_scale_24 kind_attention_scale_27 kind_attention_scale_23
challengeID                                                                                 
1   0.130765    40.0    202.485367  1.893256    27.0    1.0 2.0 0.0 2.254198    2.289966    ... 0   0   0   0   0   0   0   0   0   0
2   0.000000    40.0    45.608219   1.000000    24.0    1.0 2.0 0.0 2.000000    3.000000    ... 0   0   0   0   0   0   0   0   0   0
3   0.000000    35.0    39.060299   2.000000    23.0    1.0 2.0 0.0 2.254198    2.289966    ... 0   0   0   0   0   0   0   0   0   0
4   0.000000    30.0    22.304855   1.893256    22.0    1.0 3.0 0.0 2.000000    3.000000    ... 0   0   0   0   0   0   0   0   0   0
5   0.000000    25.0    35.518272   1.893256    19.0    1.0 1.0 6.0 1.000000    3.000000    ... 0

I want to do this:

x = [40.0, 40.0, 35.0, 30.0, 25.0]
find_column(x)

and have find_column(x) return m1lenmin

480

asked Jul 25 '17 20:07

amaatouq

2 Answers

Approach #1

Here's one vectorized approach leveraging NumPy broadcasting -

df.columns[(df.values == np.asarray(x)[:,None]).all(0)]

Sample run -

In [367]: df
Out[367]: 
   0  1  2  3  4  5  6  7  8  9
0  7  1  2  6  2  1  7  2  0  6
1  5  4  3  3  2  1  1  1  5  5
2  7  7  2  2  5  4  6  6  5  7
3  0  5  4  1  5  7  8  2  2  4
4  7  1  0  4  5  4  3  2  8  6

In [368]: x = df.iloc[:,2].values.tolist()

In [369]: x
Out[369]: [2, 3, 2, 4, 0]

In [370]: df.columns[(df.values == np.asarray(x)[:,None]).all(0)]
Out[370]: Int64Index([2], dtype='int64')

Approach #2

Alternatively, here's another using the concept of views -

def view1D(a, b): # a, b are arrays
    a = np.ascontiguousarray(a)
    b = np.ascontiguousarray(b)
    void_dt = np.dtype((np.void, a.dtype.itemsize * a.shape[1]))
    return a.view(void_dt).ravel(),  b.view(void_dt).ravel()

df1D_arr, x1D = view1D(df.values.T,np.asarray(x)[None])
out = np.flatnonzero(df1D_arr==x1D)

Sample run -

In [442]: df
Out[442]: 
   0  1  2  3  4  5  6  7  8  9
0  7  1  2  6  2  1  7  2  0  6
1  5  4  3  3  2  1  1  1  5  5
2  7  7  2  2  5  4  6  6  5  7
3  0  5  4  1  5  7  8  2  2  4
4  7  1  0  4  5  4  3  2  8  6

In [443]: x = df.iloc[:,5].values.tolist()

In [444]: df1D_arr, x1D = view1D(df.values.T,np.asarray(x)[None])

In [445]: np.flatnonzero(df1D_arr==x1D)
Out[445]: array([5])

149

answered Nov 11 '22 19:11

Divakar

Try this:

In [91]: x = np.array(x)

In [94]: df.apply(lambda col: col.eq(x).all())
Out[94]:
m1lenhr     False
m1lenmin     True
m1citywt    False
m1a12a      False
cm1age      False
cm1numb     False
m1b1a       False
m1b1b       False
m1b12a      False
m1b12b      False
dtype: bool

In [95]: df.columns[df.apply(lambda col: col.eq(x).all()).values]
Out[95]: Index(['m1lenmin'], dtype='object')

answered Nov 11 '22 21:11

MaxU - stop WAR against UA

Related questions
                            
                                Python how to get value from argparse from variable, but not the name of the variable?
                            
                                Create a matrix from a vector where each row is a shifted version of the vector
                            
                                Deploying asgi and wsgi on Heroku
                            
                                How to play mp3 from bytes?
                            
                                cbind (R function) equivalent in numpy
                            
                                How to import and call a Python function in a Jinja template? [closed]
                            
                                Get keys of pandas.Series.value_counts
                            
                                How can I display the test name *after* the test using pytest?
                            
                                Convert array into percentiles
                            
                                why is that people use sqlalchemy CORE to save data and use sqlalchemy ORM to query data
                            
                                what is the difference between scipy.stats module and numpy.random module, between similar methods that both modules have?
                            
                                How to get list of values in ImageDataGenerator.flow_from_directory Keras?
                            
                                Unresolved reference when calling a global variable?
                            
                                Use scrapy to get list of urls, and then scrape content inside those urls
                            
                                Convert PyQt5 QPixmap to numpy ndarray
                            
                                Best Algorithm to make correction typos in text
                            
                                Expanding/Zooming in a numpy array
                            
                                Memory Sharing among workers in gunicorn using --preload
                            
                                Filtering on index levels in a pandas.DataFrame
                            
                                Convert datetime to time in python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Find column name in pandas that matches an array

Tags:

python

pandas

dataframe

numpy

amaatouq

People also ask

2 Answers

Divakar

MaxU - stop WAR against UA

Recent Activity

Donate For Us