Given the following pandas data frame: <pre class="prettyprint"><code>df = pd.DataFrame({'A': ['foo' ] * 3 + ['bar'], 'B': ['w','x']*2, 'C': ['y', 'z', 'a','a'], 'D': rand.randn(4), }) print df.to_string() """ A B C D 0 foo w y 0.06075020 1 foo x z 0.21112476 2 foo w a 0.01652757 3 bar x a 0.17718772 """ </code></pre> Notice how there is no bar,w combination. When doing the following: <pre class="prettyprint"><code>pv0 = pandas.pivot_table(df, rows=['A','B'],cols=['C'], aggfunc=numpy.sum) pv0.ix['bar','x'] #returns result pv0.ix['bar','w'] #key error though i would like it to return all Nan's pv0.index #returns [(bar, x), (foo, w), (foo, x)] </code></pre> As long as there is at least one entry in column 'C' as in the case of foo,x (it only has a value for 'z' in the 'C' column) it will return NaN for the other column values of 'C' not present for foo,x (e.g. 'a','y') What I would like would be to have all multiindex combinations, even those that have no data for all column values. <pre class="prettyprint"><code>pv0.index #I would like it to return [(bar, w), (bar, x), (foo, w), (foo, x)] </code></pre> I can wrap the .ix commands in try/except blocks, but is there a way that pandas can fill this in automatically?

You can use reindex() method: <pre class="prettyprint"><code>>>> df1 = pd.pivot_table(df, rows=['A','B'], cols='C', aggfunc=np.sum) >>> df1 D C a y z A B bar x 0.161702 NaN NaN foo w 0.749007 0.85552 NaN x NaN NaN 0.458701 >>> index = list(iter.product(df['A'].unique(), df['B'].unique())) >>> df1.reindex(index) D C a y z foo w 0.749007 0.85552 NaN x NaN NaN 0.458701 bar w NaN NaN NaN x 0.161702 NaN NaN </code></pre>

Pandas Handling Missing Values when going from Data Frame to Pivot Table

Given the following pandas data frame:

df = pd.DataFrame({'A': ['foo' ] * 3 + ['bar'],
         'B': ['w','x']*2,
         'C': ['y', 'z', 'a','a'],
         'D': rand.randn(4),
          })

print df.to_string()
"""
     A  B  C           D
0  foo  w  y  0.06075020
1  foo  x  z  0.21112476
2  foo  w  a  0.01652757
3  bar  x  a  0.17718772
"""

Notice how there is no bar,w combination. When doing the following:

pv0 = pandas.pivot_table(df, rows=['A','B'],cols=['C'], aggfunc=numpy.sum)

pv0.ix['bar','x'] #returns result

pv0.ix['bar','w'] #key error though i would like it to return all Nan's

pv0.index #returns 
[(bar, x), (foo, w), (foo, x)]

As long as there is at least one entry in column 'C' as in the case of foo,x (it only has a value for 'z' in the 'C' column) it will return NaN for the other column values of 'C' not present for foo,x (e.g. 'a','y')

What I would like would be to have all multiindex combinations, even those that have no data for all column values.

pv0.index #I would like it to return
[(bar, w), (bar, x), (foo, w), (foo, x)]

I can wrap the .ix commands in try/except blocks, but is there a way that pandas can fill this in automatically?

How does pandas deal with missing values?

In order to check missing values in Pandas DataFrame, we use a function isnull() and notnull(). Both function help in checking whether a value is NaN or not. These function can also be used in Pandas Series in order to find null values in a series.

How do I fill NA values in pandas?

Pandas DataFrame fillna() MethodThe fillna() method replaces the NULL values with a specified value. The fillna() method returns a new DataFrame object unless the inplace parameter is set to True , in that case the fillna() method does the replacing in the original DataFrame instead.

What is the difference between pivot table and Groupby in pandas?

What is the difference between the pivot_table and the groupby? The groupby method is generally enough for two-dimensional operations, but pivot_table is used for multi-dimensional grouping operations.

You can use reindex() method:

>>> df1 = pd.pivot_table(df, rows=['A','B'], cols='C', aggfunc=np.sum)
>>> df1
              D                   
C             a        y         z
A   B                             
bar x  0.161702      NaN       NaN
foo w  0.749007  0.85552       NaN
    x       NaN      NaN  0.458701

>>> index = list(iter.product(df['A'].unique(), df['B'].unique()))
>>> df1.reindex(index)
              D                   
C             a        y         z
foo w  0.749007  0.85552       NaN
    x       NaN      NaN  0.458701
bar w       NaN      NaN       NaN
    x  0.161702      NaN       NaN

Pandas Handling Missing Values when going from Data Frame to Pivot Table

Tags:

python

pandas

pivot-table

Paul

People also ask

1 Answers

Roman Pekar

Recent Activity

Donate For Us

Pandas Handling Missing Values when going from Data Frame to Pivot Table

Tags:

python

pandas

pivot-table

Paul

People also ask

1 Answers

Roman Pekar

Related questions

Recent Activity

Donate For Us