Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to scan a pandas dataframe for all values greater than something and returns row and column number corresponding to that value?

I have a problem where I have huge dataset like below (Correl Coef matrix)

  A           B           C           D           E
A 1,          0.413454352,0.615350574,0.479720098,0.34261232
B 0.413454352,1,          0.568124328,0.316543449,0.361164436
C 0.615350574,0.568124328,1,          0.633182519,0.790921334
D 0.479720098,0.316543449,0.633182519,1,          0.450248008
E 0.34261232, 0.361164436,0.790921334,0.450248008,1

I want to fetch all the values in this data frame where cell value is greater than 0.6 it should be along with row name and column name like below

   row_name col_name value
1  A        C        0.61
2  C        A        0.61
3  C        D        0.63
3  C        E        0.79
4  D        C        0.63
5  E        C        0.79

If we can also ignore either (A,C) or (C,A) ..it would be much better.

I know I can do it using for loop but that method is not efficient for large data set.

like image 951
Sudheer1990 Avatar asked Jul 24 '16 12:07

Sudheer1990


2 Answers

Here's a NumPy based approach -

# Extract values and row, column names
arr = df.values
index_names = df.index
col_names = df.columns

#  Get indices where such threshold is crossed; avoid diagonal elems
R,C = np.where(np.triu(arr,1)>0.6)

# Arrange those in columns and put out as a dataframe
out_arr = np.column_stack((index_names[R],col_names[C],arr[R,C]))
df_out = pd.DataFrame(out_arr,columns=[['row_name','col_name','value']])

Sample run -

In [139]: df
Out[139]: 
          A         B         C         D         E
P  1.000000  0.031388  0.263606  0.121490  0.628969
Q  0.031388  1.000000  0.963510  0.497828  0.955238
R  0.263606  0.963510  1.000000  0.917935  0.520522
S  0.121490  0.497828  0.917935  1.000000  0.728386
T  0.628969  0.955238  0.520522  0.728386  1.000000

In [140]: df_out
Out[140]: 
  row_name col_name     value
0        P        E  0.628969
1        Q        C   0.96351
2        Q        E  0.955238
3        R        D  0.917935
4        S        E  0.728386
like image 125
Divakar Avatar answered Oct 05 '22 23:10

Divakar


UPDATE: using @Divakar's solution and his hints:

In [186]: df = pd.DataFrame(np.triu(df, 1), columns=df.columns, index=df.index)

In [187]: df
Out[187]:
     A         B         C         D         E
A  0.0  0.413454  0.615351  0.479720  0.342612
B  0.0  0.000000  0.568124  0.316543  0.361164
C  0.0  0.000000  0.000000  0.633183  0.790921
D  0.0  0.000000  0.000000  0.000000  0.450248
E  0.0  0.000000  0.000000  0.000000  0.000000

In [188]: df[df > 0.6].stack().reset_index()
Out[188]:
  level_0 level_1         0
0       A       C  0.615351
1       C       D  0.633183
2       C       E  0.790921

OLD answer:

In [96]: df[df > 0.6]
Out[96]:
          A    B         C         D         E
A  1.000000  NaN  0.615351       NaN       NaN
B       NaN  1.0       NaN       NaN       NaN
C  0.615351  NaN  1.000000  0.633183  0.790921
D       NaN  NaN  0.633183  1.000000       NaN
E       NaN  NaN  0.790921       NaN  1.000000

In [97]: df[df > 0.6].stack()
Out[97]:
A  A    1.000000
   C    0.615351
B  B    1.000000
C  A    0.615351
   C    1.000000
   D    0.633183
   E    0.790921
D  C    0.633183
   D    1.000000
E  C    0.790921
   E    1.000000
dtype: float64

or:

In [99]: df[df > 0.6].stack().reset_index()
Out[99]:
   level_0 level_1         0
0        A       A  1.000000
1        A       C  0.615351
2        B       B  1.000000
3        C       A  0.615351
4        C       C  1.000000
5        C       D  0.633183
6        C       E  0.790921
7        D       C  0.633183
8        D       D  1.000000
9        E       C  0.790921
10       E       E  1.000000

data set:

In [100]: df
Out[100]:
          A         B         C         D         E
A  1.000000  0.413454  0.615351  0.479720  0.342612
B  0.413454  1.000000  0.568124  0.316543  0.361164
C  0.615351  0.568124  1.000000  0.633183  0.790921
D  0.479720  0.316543  0.633183  1.000000  0.450248
E  0.342612  0.361164  0.790921  0.450248  1.000000
like image 27
MaxU - stop WAR against UA Avatar answered Oct 05 '22 23:10

MaxU - stop WAR against UA