Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

record the location of a conditional entry in pandas

Tags:

I have a data frame that looks like this:

enter image description here

and I want to loop through each row and print the [i,j] position of a non-NaN entry. here, the loop would ideally print "G56" and "G51".

So far I have created a T/F data frame that records all non-NaNs as True:

df_na = df.notnull()

and I can get the row index for any Trues:

for index, row in df_na.iterrows():
    if row.any() == True:
        print(index)

but I can't get the column name. (I'm also concerned with this approach since iterrows() is slower than itertuples().

like image 454
invictus Avatar asked Sep 19 '18 14:09

invictus


2 Answers

Setup

df = pd.DataFrame(np.nan, range(54, 62), [*'ABCDEFGHIJ'])
df.at[56, 'G'] = 3
df.at[61, 'G'] = 2

any with axis=1

df.index[df.notna().any(1)]

Int64Index([56, 61], dtype='int64')

Print

print(*df.index[df.notna().any(1)], sep='\n')

56
61

More Generally

numpy.where

i, j = np.where(df.notna())
print(*zip(df.index[i], df.columns[j]), sep='\n')

(56, 'G')
(61, 'G')

stack

By default, stack drops null values

print(*df.stack().index.values, sep='\n')

(56, 'G')
(61, 'G')
like image 130
piRSquared Avatar answered Sep 28 '22 19:09

piRSquared


Using notnull return Boolean , then sum and slice with the index

df.index[df.notnull().sum(1).nonzero()]
Out[646]: Int64Index([56, 61], dtype='int64')
like image 41
BENY Avatar answered Sep 28 '22 18:09

BENY