I need to get the column names of a pandas DataFrame where the columns match those in a numpy array.
Example
import numpy as np
import pandas as pd
x = pd.DataFrame( data=[[0, 0, 1], [0, 1, 0], [1, 0, 0], [0, 1, 1], [0, 1, 0], [0, 1, 1]], columns=list('abc') )
y = np.array( x[['b','c']] )
y
y has then the second and third columns from the DataFrame:
array([[0, 1],
[1, 0],
[0, 0],
[1, 1],
[1, 0],
[1, 1]])
How can I get the column names where y is in x? (In this case b, c)
I am looking for something like:
x[ x==y ].columns
or
pd.DataFrame(y).isin(x)
The example is motivated by a feature selection problem, and was taken from the sklearn page.
I am using numpy 1.11.1 and pandas 0.18.1.
Here's an approach with NumPy broadcasting -
x.columns[(x.values[...,None] == y[:,None]).all(0).any(1)]
Maybe this?
import numpy as np
import pandas as pd
x = pd.DataFrame( data=[[0, 0, 1], [0, 1, 0], [1, 0, 0], [0, 1, 1], [0, 1, 0], [0, 1, 1]], columns=list('abc') )
y = np.array( x[['b','c']] )
for yj in y.T:
for xj in x:
if (all(x[xj] == yj)):
print(xj)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With