I have a dataframe df1 that corresponds to the egelist of nodes in a network and value of the nodes themself like the following:
df
    node_i    node_j    value_i   value_j
0    3         4          89         33
1    3         2          89         NaN
2    3         5          89         69
3    0         2          45         NaN
4    0         3          45         89
5    1         2          109        NaN
6    1         8          109        NaN
I want to add a column w that correspond to the value_j if there is the value. If value_j is NaN I would like to set w as the average of the values of the adjacent nodes of i. In the case that node_i has only adjacent nodes with NaN values set w=1.
so the final dataframe should be like the foolowing:
df
    node_i    node_j    value_i   value_j      w
0    3         4          89         33       33
1    3         2          89         NaN      51      # average of adjacent nodes
2    3         5          89         69       69
3    0         2          45         NaN      89      # average of adjacent nodes
4    0         3          45         89       89
5    1         2          109        NaN       1      # 1
6    1         8          109        NaN       1      # 1
I am doing a loop like the following but I would like to use apply:
nodes = pd.unique(df['node_i'])
df['w'] = 0
for i in nodes:
    tmp = df[df['node_i'] == i]
    avg_w = np.mean(tmp['value_j'])
    if np.isnan(avg_w):
          df['w'][idx] = 1
    else:
          tmp.ix[tmp.value_j.isnull(), 'value_j'] = avg_w ## replace NaN with values
          df['w'][idx] = tmp['value_j'][idx]  
                You can replace values of all or selected columns based on the condition of pandas DataFrame by using DataFrame. loc[ ] property. The loc[] is used to access a group of rows and columns by label(s) or a boolean array. It can access and can also manipulate the values of pandas DataFrame.
Use df. replace(np. nan,'',regex=True) method to replace all NaN values to an empty string in the Pandas DataFrame column.
Pandas with Pythonfillna() method is used to replace missing values (Nan or NA) with a specified value.
you can use groupby to do this:
fill_value = df.groupby("node_i")["value_j"].mean().fillna(1.0)
df["w"] = fill_value.reindex(df["node_i"]).values
df["w"][df["value_j"].notnull()] = df["value_j"][df["value_j"].notnull()]
                        I Think you need fillna using once ffill and bfill and take average of it then fillna with 1 as:
df['w'] = ((df['value_j'].fillna(method='ffill')+df['value_j'].fillna(method='bfill'))/2).fillna(1).astype(int)
df
    node_i  node_j  value_i value_j w
0   3       4       89      33.0    33
1   3       2       89      NaN     51
2   3       5       89      69.0    69
3   0       2       45      NaN     79
4   0       3       45      89.0    89
5   1       2       109     NaN     1
6   1       8       109     NaN     1
Updated Answer:
You can use groupby and transform to find mean then fillna with 1 and use np.where to fill the values of w as:
values = df.groupby('node_i')['value_j'].transform('mean').fillna(1)
df['w'] = np.where(df['value_j'].notnull(),df['value_j'],values).astype(int)
df
    node_i  node_j  value_i value_j w
0   3       4       89      33.0    33
1   3       2       89      NaN     51
2   3       5       89      69.0    69
3   0       2       45      NaN     89
4   0       3       45      89.0    89
5   1       2       109     NaN     1
6   1       8       109     NaN     1
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With