Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get row and column in Pandas for a cell with a certain value

I am trying to read an Excel spreadsheet that is unformatted using Pandas. There are multiple tables within a single sheet and I want to convert these tables into dataframes. Since it is not already "indexed" in the traditional way, there are no meaningful column or row indices. Is there a way to search for a specific value and get the row, column where that is? For example, say I want to get a row, column number for all cells that contain the string "Title".

I have already tried things like DataFrame.filter but that only works if there are row and column indices.

like image 511
Gabriel Avatar asked Dec 19 '18 17:12

Gabriel


People also ask

How do I read a particular column and row in Excel using pandas?

To tell pandas to start reading an Excel sheet from a specific row, use the argument header = 0-indexed row where to start reading. By default, header=0, and the first such row is used to give the names of the data frame columns. To skip rows at the end of a sheet, use skipfooter = number of rows to skip.


2 Answers

Create a df with NaN where your_value is not found.
Drop all rows that don't contain the value.
Drop all columns that don't contain the value

    a = df.where(df=='your_value').dropna(how='all').dropna(axis=1)

To get the row(s)

    a.index

To get the column(s)

    a.columns  
like image 181
firefly Avatar answered Sep 25 '22 09:09

firefly


You can do some long and hard to read list comprehension:

# assume this df and that we are looking for 'abc'
df = pd.DataFrame({'col':['abc', 'def','wert','abc'], 'col2':['asdf', 'abc', 'sdfg', 'def']})

[(df[col][df[col].eq('abc')].index[i], df.columns.get_loc(col)) for col in df.columns for i in range(len(df[col][df[col].eq('abc')].index))]

out:

[(0, 0), (3, 0), (1, 1)]

I should note that this is (index value, column location)

you can also change .eq() to str.contains() if you are looking for any strings that contains a certain value:

[(df[col][df[col].str.contains('ab')].index[i], df.columns.get_loc(col)) for col in df.columns for i in range(len(df[col][df[col].str.contains('ab')].index))]
like image 25
It_is_Chris Avatar answered Sep 25 '22 09:09

It_is_Chris