I've pandas dataframe based on pivot table with index and columns. Index are presented with values that are not nan at least in one column, while others are nans.
          col_1  col_2  col_3  col_4 ...  col_100
index_1     1      2      nan   nan  ...     5 
index_2    nan    nan      1     1   ...     10
...        ...    ...     ...   ...  ...     ...
index_100  nan     9       4    ...  ...     nan
How can I get column names of all the not nan values in a row and put them into automatically suffixed list names by each index? Need to get this:
list_1=[col_1, col_2, col_100]
list_2=[col_3, col_4, col_100]
list_100=[col_2, col_3]
You can use stack to remove nan and groupby to gather all column names:
(df.stack()
   .reset_index(level=1)
   .groupby(level=0, sort=False)
   ['level_1'].apply(list)
)
Output:
index_1      [col_1, col_2, col_100]
index_2      [col_3, col_4, col_100]
index_100             [col_2, col_3]
Name: level_1, dtype: object
Generate sample data
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(100,100), columns=['col_' + str(i) for i in 
    range(1,101)], index=['index_' + str(i) for i in range(1, 101)])
for i in range(len(df)):
    df.iloc[i, np.random.randint(0, 100, size=90).astype(int)] = np.nan
df
Out[45]: 
              col_1     col_2     col_3  ...    col_98    col_99   col_100
index_1         NaN       NaN       NaN  ...  0.520094       NaN  0.914679
index_2         NaN       NaN       NaN  ...       NaN  0.606264  0.615714
index_3         NaN       NaN       NaN  ...  0.350248       NaN       NaN
index_4    0.018335  0.296533  0.128359  ...       NaN  0.463084       NaN
index_5    0.164180  0.321482  0.423314  ...  0.909450       NaN  0.968680
            ...       ...       ...  ...       ...       ...       ...
index_96   0.444807  0.010812       NaN  ...  0.763669  0.074926       NaN
index_97        NaN       NaN       NaN  ...       NaN       NaN       NaN
index_98        NaN  0.285217       NaN  ...  0.913199  0.699628       NaN
index_99   0.800824  0.004250       NaN  ...       NaN       NaN  0.302858
index_100       NaN       NaN       NaN  ...  0.875435  0.700814       NaN
[100 rows x 100 columns]
compute result
result = {}
for i in range(len(df)):
    l = df.columns[np.isfinite(df.iloc[i])]
    result['list_' + str(i)] = list(l)
get list_1
result['list_1']
Out[47]: 
['col_4',
 'col_6',
 'col_9',
 'col_10',
 'col_14',
 'col_15',
 'col_18',
 'col_19',
 'col_26',
 'col_27',
 'col_33',
 'col_34',
 'col_35',
 'col_37',
 'col_38',
 'col_39',
 'col_40',
 'col_42',
 'col_43',
 'col_50',
 'col_57',
 'col_59',
 'col_60',
 'col_63',
 'col_64',
 'col_65',
 'col_66',
 'col_67',
 'col_69',
 'col_71',
 'col_76',
 'col_80',
 'col_81',
 'col_85',
 'col_87',
 'col_88',
 'col_89',
 'col_92',
 'col_96',
 'col_97',
 'col_99',
 'col_100']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With