Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding an array elements location in a pandas frame column (a.k.a pd.series)

I have a pandas frame similar to this one:

import pandas as pd
import numpy as np

data = {'Col1' : [4,5,6,7], 'Col2' : [10,20,30,40], 'Col3' : [100,50,-30,-50], 'Col4' : ['AAA', 'BBB', 'AAA', 'CCC']}

df = pd.DataFrame(data=data, index = ['R1','R2','R3','R4'])

    Col1  Col2  Col3 Col4
R1     4    10   100  AAA
R2     5    20    50  BBB
R3     6    30   -30  AAA
R4     7    40   -50  CCC

Given an array of targets:

target_array = np.array(['AAA', 'CCC', 'EEE'])

I would like to find the cell elements indices in Col4 which also appear in the target_array.

I have tried to find a documented answer but it seems beyond my skill... Anyone has any advice?

P.S. Incidentally, for this particular case I can input a target array whose elements are the data frame indices names array(['R1', 'R3', 'R5']). Would it be easier that way?

Edit 1:

Thank you very much for all the great replies. Sadly I can only choose one but everyone seems to point @Divakar as the best. Still you should look at piRSquared and MaxU speed comparisons for all the possibilities available

like image 278
Delosari Avatar asked Jun 28 '16 18:06

Delosari


People also ask

How do you access the elements of a series in pandas?

In order to access the series element refers to the index number. Use the index operator [ ] to access an element in a series. The index must be an integer. In order to access multiple elements from a series, we use Slice operation.

How do I locate a specific cell in pandas?

In Pandas, DataFrame. loc[] property is used to get a specific cell value by row & label name(column name).

How do you access a column in a DataFrame PD?

You can use the loc and iloc functions to access columns in a Pandas DataFrame. Let's see how. If we wanted to access a certain column in our DataFrame, for example the Grades column, we could simply use the loc function and specify the name of the column in order to retrieve it.


1 Answers

You can use NumPy's in1d -

df.index[np.in1d(df['Col4'],target_array)]

Explanation

1) Create a 1D mask corresponding to each row telling us whether there is a match between col4's element and any element in target_array :

mask = np.in1d(df['Col4'],target_array)

2) Use the mask to select valid indices from the dataframe as final output :

out = df.index[np.in1d(df['Col4'],target_array)]
like image 125
Divakar Avatar answered Oct 20 '22 19:10

Divakar