I have a Pandas dataframe 'df' like this : <pre class="prettyprint"><code> X Y IX1 IX2 A A1 20 30 A2 20 30 A5 20 30 B B2 20 30 B4 20 30 </code></pre> It lost some rows, and I want to fill in the gap in the middle like this: <pre class="prettyprint"><code> X Y IX1 IX2 A A1 20 30 A2 20 30 A3 NaN NaN A4 NaN NaN A5 20 30 B B2 20 30 B3 NaN NaN B4 20 30 </code></pre> Is there a pythonic way to do this ?

You need to construct your full index, and then use the <code>reindex</code> method of the dataframe. Like so... <pre class="prettyprint"><code>import pandas import StringIO datastring = StringIO.StringIO("""\ C1,C2,C3,C4 A,A1,20,30 A,A2,20,30 A,A5,20,30 B,B2,20,30 B,B4,20,30""") dataframe = pandas.read_csv(datastring, index_col=['C1', 'C2']) full_index = [('A', 'A1'), ('A', 'A2'), ('A', 'A3'), ('A', 'A4'), ('A', 'A5'), ('B', 'B1'), ('B', 'B2'), ('B', 'B3'), ('B', 'B4')] new_df = dataframe.reindex(full_index) new_df C3 C4 A A1 20 30 A2 20 30 A3 NaN NaN A4 NaN NaN A5 20 30 B B1 NaN NaN B2 20 30 B3 20 30 B4 20 30 </code></pre> And then you can use the <code>fillna</code> method to set the NaNs to whatever you want. <h3>update (June 2014)</h3> Just had to revisit this myself... In the current version of pandas, there is a function to build <code>MultiIndex</code> from the Cartesian product of iterables. So the above solution could become: <pre class="prettyprint"><code>datastring = StringIO.StringIO("""\ C1,C2,C3,C4 A,1,20,30 A,2,20,30 A,5,20,30 B,2,20,30 B,4,20,30""") dataframe = pandas.read_csv(datastring, index_col=['C1', 'C2']) full_index = pandas.MultiIndex.from_product([('A', 'B'), range(6)], names=['C1', 'C2']) new_df = dataframe.reindex(full_index) new_df C3 C4 C1 C2 A 1 20 30 2 20 30 3 NaN NaN 4 NaN NaN 5 20 30 B 1 NaN NaN 2 20 30 3 20 30 4 20 30 5 NaN NaN </code></pre> Pretty elegant, in my opinion.

How to fill the missing record of Pandas dataframe in pythonic way?

I have a Pandas dataframe 'df' like this :

         X   Y  
IX1 IX2
A   A1  20  30
    A2  20  30
    A5  20  30
B   B2  20  30
    B4  20  30

It lost some rows, and I want to fill in the gap in the middle like this:

         X   Y  
IX1 IX2
A   A1  20  30
    A2  20  30
    A3  NaN NaN
    A4  NaN NaN
    A5  20  30
B   B2  20  30
    B3  NaN NaN
    B4  20  30

Is there a pythonic way to do this ?

How do I fill blank cells in pandas DataFrame?

You can replace blank/empty values with DataFrame. replace() methods. The replace() method replaces the specified value with another specified value on a specified column or on all columns of a DataFrame; replaces every case of the specified value.

How do you fill null values in a data frame?

Pandas DataFrame fillna() MethodThe fillna() method replaces the NULL values with a specified value. The fillna() method returns a new DataFrame object unless the inplace parameter is set to True , in that case the fillna() method does the replacing in the original DataFrame instead.

How do you replace missing values in a data set?

Missing values can also be imputed using interpolation. Pandas interpolate method can be used to replace the missing values with different interpolation methods like 'polynomial', 'linear', 'quadratic'. Default method is 'linear'.

Which method is used to fill in the blanks or missing values in a DataFrame?

Pandas Dataframe method in Python such as fillna can be used to replace the missing values. Methods such as mean(), median() and mode() can be used on Dataframe for finding their values.

You need to construct your full index, and then use the reindex method of the dataframe. Like so...

import pandas
import StringIO
datastring = StringIO.StringIO("""\
C1,C2,C3,C4
A,A1,20,30
A,A2,20,30
A,A5,20,30
B,B2,20,30
B,B4,20,30""")

dataframe = pandas.read_csv(datastring, index_col=['C1', 'C2'])
full_index = [('A', 'A1'), ('A', 'A2'), ('A', 'A3'), 
              ('A', 'A4'), ('A', 'A5'), ('B', 'B1'), 
              ('B', 'B2'), ('B', 'B3'), ('B', 'B4')]
new_df = dataframe.reindex(full_index)
new_df
      C3  C4
A A1  20  30
  A2  20  30
  A3 NaN NaN
  A4 NaN NaN
  A5  20  30
B B1 NaN NaN
  B2  20  30
  B3  20  30
  B4  20  30

And then you can use the fillna method to set the NaNs to whatever you want.

update (June 2014)

Just had to revisit this myself... In the current version of pandas, there is a function to build MultiIndex from the Cartesian product of iterables. So the above solution could become:

datastring = StringIO.StringIO("""\
C1,C2,C3,C4
A,1,20,30
A,2,20,30
A,5,20,30
B,2,20,30
B,4,20,30""")

dataframe = pandas.read_csv(datastring, index_col=['C1', 'C2'])
full_index = pandas.MultiIndex.from_product([('A', 'B'), range(6)], names=['C1', 'C2'])
new_df = dataframe.reindex(full_index)
new_df
      C3  C4
C1 C2
 A  1  20  30
    2  20  30
    3 NaN NaN
    4 NaN NaN
    5  20  30
 B  1 NaN NaN
    2  20  30
    3  20  30
    4  20  30
    5 NaN NaN

Pretty elegant, in my opinion.

How to fill the missing record of Pandas dataframe in pythonic way?

Tags:

python

pandas

bigbug

People also ask

1 Answers

update (June 2014)

Paul H

Recent Activity

Donate For Us

How to fill the missing record of Pandas dataframe in pythonic way?

Tags:

python

pandas

bigbug

People also ask

1 Answers

update (June 2014)

Paul H

Related questions

Recent Activity

Donate For Us