I have some data from an experiment, and within each trial there are some single values, surrounded by <code>NA</code>'s, that I want to fill out to the entire trial: <pre class="prettyprint"><code>df = pd.DataFrame({'trial': [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3], 'cs_name': [np.nan, 'A1', np.nan, np.nan, np.nan, np.nan, 'B2', np.nan, 'A1', np.nan, np.nan, np.nan]}) Out[177]: cs_name trial 0 NaN 1 1 A1 1 2 NaN 1 3 NaN 1 4 NaN 2 5 NaN 2 6 B2 2 7 NaN 2 8 A1 3 9 NaN 3 10 NaN 3 11 NaN 3 </code></pre> I'm able to fill these values within the whole trial by using both <code>bfill()</code> and <code>ffill()</code>, but I'm wondering if there is a better way to achieve this. <pre class="prettyprint"><code>df['cs_name'] = df.groupby('trial')['cs_name'].ffill() df['cs_name'] = df.groupby('trial')['cs_name'].bfill() </code></pre> Expected output: <pre class="prettyprint"><code> cs_name trial 0 A1 1 1 A1 1 2 A1 1 3 A1 1 4 B2 2 5 B2 2 6 B2 2 7 B2 2 8 A1 3 9 A1 3 10 A1 3 11 A1 3 </code></pre>

An alternative approach is to use <code>first_valid_index</code> and a <code>transform</code>: <pre class="prettyprint"><code>In [11]: g = df.groupby('trial') In [12]: g['cs_name'].transform(lambda s: s.loc[s.first_valid_index()]) Out[12]: 0 A1 1 A1 2 A1 3 A1 4 B2 5 B2 6 B2 7 B2 8 A1 9 A1 10 A1 11 A1 Name: cs_name, dtype: object </code></pre> This ought to be more efficient then using ffill followed by a bfill... And use this to change the <code>cs_name</code> column: <pre class="prettyprint"><code>df['cs_name'] = g['cs_name'].transform(lambda s: s.loc[s.first_valid_index()]) </code></pre> Note: I think it would be nice enhancement to have a method to grab the first non-null object in the pandas, in numpy it's an open request, I don't think there is currently a method (I could be wrong!)...

pandas: Filling missing values within a group

I have some data from an experiment, and within each trial there are some single values, surrounded by NA's, that I want to fill out to the entire trial:

df = pd.DataFrame({'trial': [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3], 
    'cs_name': [np.nan, 'A1', np.nan, np.nan, np.nan, np.nan, 'B2', 
                np.nan, 'A1', np.nan, np.nan, np.nan]})
Out[177]: 
   cs_name  trial
0      NaN      1
1       A1      1
2      NaN      1
3      NaN      1
4      NaN      2
5      NaN      2
6       B2      2
7      NaN      2
8       A1      3
9      NaN      3
10     NaN      3
11     NaN      3

I'm able to fill these values within the whole trial by using both bfill() and ffill(), but I'm wondering if there is a better way to achieve this.

df['cs_name'] = df.groupby('trial')['cs_name'].ffill()
df['cs_name'] = df.groupby('trial')['cs_name'].bfill()

Expected output:

   cs_name  trial
0       A1      1
1       A1      1
2       A1      1
3       A1      1
4       B2      2
5       B2      2
6       B2      2
7       B2      2
8       A1      3
9       A1      3
10      A1      3
11      A1      3

How do I fill NULL values in multiple columns in pandas?

Example 1: Filling missing columns values with fixed values: We can use fillna() function to impute the missing values of a data frame to every column defined by a dictionary of values.

How do you fill a missing value in a list Python?

You can use fillna() function to fill missing values with default value that you want. e.g: If df1 is your dataframe containing missing values in multiple columns. You can also use pandas isna() function to check where values are missing.

How do you replace missing values in a data set?

Missing values can also be imputed using interpolation. Pandas interpolate method can be used to replace the missing values with different interpolation methods like 'polynomial', 'linear', 'quadratic'. Default method is 'linear'.

An alternative approach is to use first_valid_index and a transform:

In [11]: g = df.groupby('trial')

In [12]: g['cs_name'].transform(lambda s: s.loc[s.first_valid_index()])
Out[12]: 
0     A1
1     A1
2     A1
3     A1
4     B2
5     B2
6     B2
7     B2
8     A1
9     A1
10    A1
11    A1
Name: cs_name, dtype: object

This ought to be more efficient then using ffill followed by a bfill...

And use this to change the cs_name column:

df['cs_name'] = g['cs_name'].transform(lambda s: s.loc[s.first_valid_index()])

Note: I think it would be nice enhancement to have a method to grab the first non-null object in the pandas, in numpy it's an open request, I don't think there is currently a method (I could be wrong!)...

If you want to avoid the error that appears when some groups contain only NaN you could do the following (Note that I changed the df so there are only Nan for the group having trial=1):

df = pd.DataFrame({'trial': [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3,1,1], 
'cs_name': [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 'B2', np.nan, 
'A3', np.nan, np.nan, np.nan, np.nan,np.nan]})

g = data.groupby('trial')

g['cs_name'].transform(lambda s: 'No values to aggregate' if 
    pd.isnull(s).all() == True else s.loc[s.first_valid_index()])

df['cs_name'] = g['cs_name'].transform(lambda s: 'No values to aggregate' if 
    pd.isnull(s).all() == True else s.loc[s.first_valid_index()])`

This way you input 'No Values to aggregate' (or whatever you want) when the program finds all NaN for a particular group, instead of an error.

Hope this helps :)

Federico

pandas: Filling missing values within a group

Tags:

python

pandas

Marius

People also ask

2 Answers

Andy Hayden

Federico De Cillia

Recent Activity

Donate For Us

pandas: Filling missing values within a group

Tags:

python

pandas

Marius

People also ask

2 Answers

Andy Hayden

Federico De Cillia

Related questions

Recent Activity

Donate For Us