Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to do Mapping between Numpy Array and Pandas Dataframe?

I have a pandas dataframe like

data = [[0, 10, 22000, 3], 
        [1, 15, 42135, 4], 
        [0, 14, 13526, 5],
        [0, 16, 32156, 3], 
        [1, 23, 13889, 5], 
        [0, 18, 18000, 6], 
        [0, 21, 13189, 2], 
        [1, 32, 58766, 2]] 

df = pd.DataFrame(data, columns = ['Gender', 'Age', 'Amount','Dependents']) 

And I have a numpy array

arr = numpy.array([[1, 15, 42135, 4],
       [1, 23, 13889, 5],
       [0, 21, 13189, 2]])

Here I would like to create a new column in the dataframe 'data'(say 'Good_Bad') with 1 if the array present in data.

The result should be like

data = [[0, 10, 22000, 3, 0], 
        [1, 15, 42135, 4, 1], 
        [0, 14, 13526, 5, 0],
        [0, 16, 32156, 3, 0], 
        [1, 23, 13889, 5, 1], 
        [0, 18, 18000, 6, 0], 
        [0, 21, 13189, 2, 1], 
        [1, 32, 58766, 2, 0]] 

The records 2,5,7 has 1 in the new column and other records have 0. Not sure how to map array and dataframe.

like image 723
hanzgs Avatar asked Feb 12 '26 21:02

hanzgs


1 Answers

Approach #1

Vectorized one with broadcasting -

dfc = df[['Gender','Age','Amount','Dependents']] # select relevant cols
df['Good_Bad'] = (dfc.values[:,None]==arr).all(2).any(1).astype(int)

On newer pandas versions (>= v0.24), use dfc.to_numpy(copy=False) instead of dfc.values.

Approach 2

Here's one with views for memory and hence performance efficiency -

# https://stackoverflow.com/a/45313353/ @Divakar
def view1D(a, b): # a, b are arrays
    # This function gets 1D view into 2D input arrays
    a = np.ascontiguousarray(a)
    b = np.ascontiguousarray(b)
    void_dt = np.dtype((np.void, a.dtype.itemsize * a.shape[-1]))
    return a.view(void_dt).ravel(),  b.view(void_dt).ravel()

D,A = view1D(dfc,arr)
df['Good_Bad'] = np.isin(D,A).astype(int)
like image 60
Divakar Avatar answered Feb 14 '26 09:02

Divakar