I have a dataframe <code>df1</code> that corresponds to the egelist of <code>nodes</code> in a network and <code>value</code> of the nodes themself like the following: <pre class="prettyprint"><code>df node_i node_j value_i value_j 0 3 4 89 33 1 3 2 89 NaN 2 3 5 89 69 3 0 2 45 NaN 4 0 3 45 89 5 1 2 109 NaN 6 1 8 109 NaN </code></pre> I want to add a column <code>w</code> that correspond to the <code>value_j</code> if there is the value. If <code>value_j</code> is <code>NaN</code> I would like to set <code>w</code> as the average of the values of the adjacent nodes of <code>i</code>. In the case that <code>node_i</code> has only adjacent nodes with <code>NaN</code> values set <code>w=1</code>. so the final dataframe should be like the foolowing: <pre class="prettyprint"><code>df node_i node_j value_i value_j w 0 3 4 89 33 33 1 3 2 89 NaN 51 # average of adjacent nodes 2 3 5 89 69 69 3 0 2 45 NaN 89 # average of adjacent nodes 4 0 3 45 89 89 5 1 2 109 NaN 1 # 1 6 1 8 109 NaN 1 # 1 </code></pre> I am doing a loop like the following but I would like to use <code>apply</code>: <pre class="prettyprint"><code>nodes = pd.unique(df['node_i']) df['w'] = 0 for i in nodes: tmp = df[df['node_i'] == i] avg_w = np.mean(tmp['value_j']) if np.isnan(avg_w): df['w'][idx] = 1 else: tmp.ix[tmp.value_j.isnull(), 'value_j'] = avg_w ## replace NaN with values df['w'][idx] = tmp['value_j'][idx] </code></pre>

I Think you need <code>fillna</code> using once <code>ffill</code> and <code>bfill</code> and take average of it then <code>fillna</code> with <code>1</code> as: <pre class="prettyprint"><code>df['w'] = ((df['value_j'].fillna(method='ffill')+df['value_j'].fillna(method='bfill'))/2).fillna(1).astype(int) df node_i node_j value_i value_j w 0 3 4 89 33.0 33 1 3 2 89 NaN 51 2 3 5 89 69.0 69 3 0 2 45 NaN 79 4 0 3 45 89.0 89 5 1 2 109 NaN 1 6 1 8 109 NaN 1 </code></pre> Updated Answer: You can use <code>groupby</code> and <code>transform</code> to find <code>mean</code> then <code>fillna</code> with <code>1</code> and use <code>np.where</code> to fill the values of <code>w</code> as: <pre class="prettyprint"><code>values = df.groupby('node_i')['value_j'].transform('mean').fillna(1) df['w'] = np.where(df['value_j'].notnull(),df['value_j'],values).astype(int) df node_i node_j value_i value_j w 0 3 4 89 33.0 33 1 3 2 89 NaN 51 2 3 5 89 69.0 69 3 0 2 45 NaN 89 4 0 3 45 89.0 89 5 1 2 109 NaN 1 6 1 8 109 NaN 1 </code></pre>

Python: how to replace NaN with conditions in a dataframe?

Tags:

python

pandas

apply

I have a dataframe df1 that corresponds to the egelist of nodes in a network and value of the nodes themself like the following:

df
    node_i    node_j    value_i   value_j
0    3         4          89         33
1    3         2          89         NaN
2    3         5          89         69
3    0         2          45         NaN
4    0         3          45         89
5    1         2          109        NaN
6    1         8          109        NaN

I want to add a column w that correspond to the value_j if there is the value. If value_j is NaN I would like to set w as the average of the values of the adjacent nodes of i. In the case that node_i has only adjacent nodes with NaN values set w=1.

so the final dataframe should be like the foolowing:

df
    node_i    node_j    value_i   value_j      w
0    3         4          89         33       33
1    3         2          89         NaN      51      # average of adjacent nodes
2    3         5          89         69       69
3    0         2          45         NaN      89      # average of adjacent nodes
4    0         3          45         89       89
5    1         2          109        NaN       1      # 1
6    1         8          109        NaN       1      # 1

I am doing a loop like the following but I would like to use apply:

nodes = pd.unique(df['node_i'])
df['w'] = 0
for i in nodes:
    tmp = df[df['node_i'] == i]
    avg_w = np.mean(tmp['value_j'])
    if np.isnan(avg_w):
          df['w'][idx] = 1
    else:
          tmp.ix[tmp.value_j.isnull(), 'value_j'] = avg_w ## replace NaN with values
          df['w'][idx] = tmp['value_j'][idx]

496

asked Sep 07 '18 09:09

emax

2 Answers

you can use groupby to do this:

fill_value = df.groupby("node_i")["value_j"].mean().fillna(1.0)
df["w"] = fill_value.reindex(df["node_i"]).values
df["w"][df["value_j"].notnull()] = df["value_j"][df["value_j"].notnull()]

114

answered Sep 20 '22 05:09

cncggvg

I Think you need fillna using once ffill and bfill and take average of it then fillna with 1 as:

df['w'] = ((df['value_j'].fillna(method='ffill')+df['value_j'].fillna(method='bfill'))/2).fillna(1).astype(int)

df
    node_i  node_j  value_i value_j w
0   3       4       89      33.0    33
1   3       2       89      NaN     51
2   3       5       89      69.0    69
3   0       2       45      NaN     79
4   0       3       45      89.0    89
5   1       2       109     NaN     1
6   1       8       109     NaN     1

Updated Answer:

You can use groupby and transform to find mean then fillna with 1 and use np.where to fill the values of w as:

values = df.groupby('node_i')['value_j'].transform('mean').fillna(1)
df['w'] = np.where(df['value_j'].notnull(),df['value_j'],values).astype(int)

df

    node_i  node_j  value_i value_j w
0   3       4       89      33.0    33
1   3       2       89      NaN     51
2   3       5       89      69.0    69
3   0       2       45      NaN     89
4   0       3       45      89.0    89
5   1       2       109     NaN     1
6   1       8       109     NaN     1

answered Sep 21 '22 05:09

Space Impact

Related questions
                            
                                Running Stardew Valley from python on Windows
                            
                                How to return a single object with Django-Rest-Framework
                            
                                How to detect the epoch where Keras earlyStopping occurred?
                            
                                Change a url parameter
                            
                                cannot import wsgi from gevent
                            
                                applying lambda row on multiple columns pandas
                            
                                ValueError uses no argument in pytest, does order of decorators matter?
                            
                                How to get dict of first two indexes for multi index data frame
                            
                                Numpy arrays vs Python arrays [duplicate]
                            
                                ImportError: No module named gspread
                            
                                Python str() vs. '' - which is preferred
                            
                                Extract string if match the value in another list
                            
                                matplotlib: Tick labels disappeared after set sharex in subplots [duplicate]
                            
                                NetworkX Key Error when writing GML file
                            
                                How to annotate that a classmethod returns an instance of that class [duplicate]
                            
                                using the timedelta.round() function
                            
                                Grouping import statements in python
                            
                                How to make video from an updating numpy array in Python
                            
                                how is asyncio.sleep() in python implemented?
                            
                                Generate a list a(n) is not of the form prime + a(k), k < n

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With