Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: how to replace NaN with conditions in a dataframe?

I have a dataframe df1 that corresponds to the egelist of nodes in a network and value of the nodes themself like the following:

df
    node_i    node_j    value_i   value_j
0    3         4          89         33
1    3         2          89         NaN
2    3         5          89         69
3    0         2          45         NaN
4    0         3          45         89
5    1         2          109        NaN
6    1         8          109        NaN

I want to add a column w that correspond to the value_j if there is the value. If value_j is NaN I would like to set w as the average of the values of the adjacent nodes of i. In the case that node_i has only adjacent nodes with NaN values set w=1.

so the final dataframe should be like the foolowing:

df
    node_i    node_j    value_i   value_j      w
0    3         4          89         33       33
1    3         2          89         NaN      51      # average of adjacent nodes
2    3         5          89         69       69
3    0         2          45         NaN      89      # average of adjacent nodes
4    0         3          45         89       89
5    1         2          109        NaN       1      # 1
6    1         8          109        NaN       1      # 1

I am doing a loop like the following but I would like to use apply:

nodes = pd.unique(df['node_i'])
df['w'] = 0
for i in nodes:
    tmp = df[df['node_i'] == i]
    avg_w = np.mean(tmp['value_j'])
    if np.isnan(avg_w):
          df['w'][idx] = 1
    else:
          tmp.ix[tmp.value_j.isnull(), 'value_j'] = avg_w ## replace NaN with values
          df['w'][idx] = tmp['value_j'][idx]  
like image 496
emax Avatar asked Sep 07 '18 09:09

emax


People also ask

How do you replace values in a DataFrame based on a condition?

You can replace values of all or selected columns based on the condition of pandas DataFrame by using DataFrame. loc[ ] property. The loc[] is used to access a group of rows and columns by label(s) or a boolean array. It can access and can also manipulate the values of pandas DataFrame.

How do you replace all NaN values with string in Pandas?

Use df. replace(np. nan,'',regex=True) method to replace all NaN values to an empty string in the Pandas DataFrame column.

Which method is used to replace NaN values with some value?

Pandas with Pythonfillna() method is used to replace missing values (Nan or NA) with a specified value.


2 Answers

you can use groupby to do this:

fill_value = df.groupby("node_i")["value_j"].mean().fillna(1.0)
df["w"] = fill_value.reindex(df["node_i"]).values
df["w"][df["value_j"].notnull()] = df["value_j"][df["value_j"].notnull()]
like image 114
cncggvg Avatar answered Sep 20 '22 05:09

cncggvg


I Think you need fillna using once ffill and bfill and take average of it then fillna with 1 as:

df['w'] = ((df['value_j'].fillna(method='ffill')+df['value_j'].fillna(method='bfill'))/2).fillna(1).astype(int)

df
    node_i  node_j  value_i value_j w
0   3       4       89      33.0    33
1   3       2       89      NaN     51
2   3       5       89      69.0    69
3   0       2       45      NaN     79
4   0       3       45      89.0    89
5   1       2       109     NaN     1
6   1       8       109     NaN     1

Updated Answer:

You can use groupby and transform to find mean then fillna with 1 and use np.where to fill the values of w as:

values = df.groupby('node_i')['value_j'].transform('mean').fillna(1)
df['w'] = np.where(df['value_j'].notnull(),df['value_j'],values).astype(int)

df

    node_i  node_j  value_i value_j w
0   3       4       89      33.0    33
1   3       2       89      NaN     51
2   3       5       89      69.0    69
3   0       2       45      NaN     89
4   0       3       45      89.0    89
5   1       2       109     NaN     1
6   1       8       109     NaN     1
like image 40
Space Impact Avatar answered Sep 21 '22 05:09

Space Impact