I am trying to impute/fill values using rows with similar columns' values. For example, I have this dataframe: <pre class="prettyprint"><code>one | two | three 1 1 10 1 1 nan 1 1 nan 1 2 nan 1 2 20 1 2 nan 1 3 nan 1 3 nan </code></pre> I wanted to using the keys of column <code>one</code> and <code>two</code> which is similar and if column <code>three</code> is not entirely nan then impute the existing value from a row of similar keys with value in column '3'. Here is my desired result: <pre class="prettyprint"><code>one | two | three 1 1 10 1 1 10 1 1 10 1 2 20 1 2 20 1 2 20 1 3 nan 1 3 nan </code></pre> You can see that keys 1 and 3 do not contain any value because the existing value does not exists. I have tried using <code>groupby</code>+<code>fillna()</code>: <pre class="prettyprint"><code>df['three'] = df.groupby(['one','two'])['three'].fillna() </code></pre> which gave me an error. I have tried forward fill which give me rather strange result where it forward fill the column 2 instead. I am using this code for forward fill. <pre class="prettyprint"><code>df['three'] = df.groupby(['one','two'], sort=False)['three'].ffill() </code></pre>

If only one non NaN value per group use <code>ffill</code> (forward filling) and <code>bfill</code> (backward filling) per group, so need <code>apply</code> with <code>lambda</code>: <pre class="prettyprint"><code>df['three'] = df.groupby(['one','two'], sort=False)['three'] .apply(lambda x: x.ffill().bfill()) print (df) one two three 0 1 1 10.0 1 1 1 10.0 2 1 1 10.0 3 1 2 20.0 4 1 2 20.0 5 1 2 20.0 6 1 3 NaN 7 1 3 NaN </code></pre> But if multiple value per group and need replace <code>NaN</code> by some constant - e.g. <code>mean</code> by group: <pre class="prettyprint"><code>print (df) one two three 0 1 1 10.0 1 1 1 40.0 2 1 1 NaN 3 1 2 NaN 4 1 2 20.0 5 1 2 NaN 6 1 3 NaN 7 1 3 NaN df['three'] = df.groupby(['one','two'], sort=False)['three'] .apply(lambda x: x.fillna(x.mean())) print (df) one two three 0 1 1 10.0 1 1 1 40.0 2 1 1 25.0 3 1 2 20.0 4 1 2 20.0 5 1 2 20.0 6 1 3 NaN 7 1 3 NaN </code></pre>

You can sort data by the column with missing values then groupby and forwardfill: <pre class="prettyprint"><code>df.sort_values('three', inplace=True) df['three'] = df.groupby(['one','two'])['three'].ffill() </code></pre>

Pandas fillna using groupby

I am trying to impute/fill values using rows with similar columns' values.

For example, I have this dataframe:

one | two | three 1      1     10 1      1     nan 1      1     nan 1      2     nan 1      2     20 1      2     nan 1      3     nan 1      3     nan

I wanted to using the keys of column one and two which is similar and if column three is not entirely nan then impute the existing value from a row of similar keys with value in column '3'.

Here is my desired result:

one | two | three 1      1     10 1      1     10 1      1     10 1      2     20 1      2     20 1      2     20 1      3     nan 1      3     nan

You can see that keys 1 and 3 do not contain any value because the existing value does not exists.

I have tried using groupby+fillna():

df['three'] = df.groupby(['one','two'])['three'].fillna()

which gave me an error.

I have tried forward fill which give me rather strange result where it forward fill the column 2 instead. I am using this code for forward fill.

df['three'] = df.groupby(['one','two'], sort=False)['three'].ffill()

How do I use Fillna in pandas?

Definition and UsageThe fillna() method replaces the NULL values with a specified value. The fillna() method returns a new DataFrame object unless the inplace parameter is set to True , in that case the fillna() method does the replacing in the original DataFrame instead.

How do you count in Groupby pandas?

Use count() by Column Name Use pandas DataFrame. groupby() to group the rows by column and use count() method to get the count for each group by ignoring None and Nan values. It works with non-floating type data as well.

Can you group by multiple columns in pandas?

Pandas comes with a whole host of sql-like aggregation functions you can apply when grouping on one or more columns. This is Python's closest equivalent to dplyr's group_by + summarise logic.

How do we fill missing values What are techniques used for same what are functions given by pandas to fill the missing values?

In order to check missing values in Pandas DataFrame, we use a function isnull() and notnull(). Both function help in checking whether a value is NaN or not. These function can also be used in Pandas Series in order to find null values in a series.

If only one non NaN value per group use ffill (forward filling) and bfill (backward filling) per group, so need apply with lambda:

df['three'] = df.groupby(['one','two'], sort=False)['three']                 .apply(lambda x: x.ffill().bfill()) print (df)    one  two  three 0    1    1   10.0 1    1    1   10.0 2    1    1   10.0 3    1    2   20.0 4    1    2   20.0 5    1    2   20.0 6    1    3    NaN 7    1    3    NaN

But if multiple value per group and need replace NaN by some constant - e.g. mean by group:

print (df)    one  two  three 0    1    1   10.0 1    1    1   40.0 2    1    1    NaN 3    1    2    NaN 4    1    2   20.0 5    1    2    NaN 6    1    3    NaN 7    1    3    NaN  df['three'] = df.groupby(['one','two'], sort=False)['three']                 .apply(lambda x: x.fillna(x.mean())) print (df)    one  two  three 0    1    1   10.0 1    1    1   40.0 2    1    1   25.0 3    1    2   20.0 4    1    2   20.0 5    1    2   20.0 6    1    3    NaN 7    1    3    NaN

You can sort data by the column with missing values then groupby and forwardfill:

df.sort_values('three', inplace=True) df['three'] = df.groupby(['one','two'])['three'].ffill()

Pandas fillna using groupby

Tags:

Phurich.P

People also ask

2 Answers

jezrael

Mykola Zotko

Recent Activity

Donate For Us

Pandas fillna using groupby

Tags:

Phurich.P

People also ask

2 Answers

jezrael

Mykola Zotko

Related questions

Recent Activity

Donate For Us