Given the following pandas data frame:
df = pd.DataFrame({'A': ['foo' ] * 3 + ['bar'],
'B': ['w','x']*2,
'C': ['y', 'z', 'a','a'],
'D': rand.randn(4),
})
print df.to_string()
"""
A B C D
0 foo w y 0.06075020
1 foo x z 0.21112476
2 foo w a 0.01652757
3 bar x a 0.17718772
"""
Notice how there is no bar,w combination. When doing the following:
pv0 = pandas.pivot_table(df, rows=['A','B'],cols=['C'], aggfunc=numpy.sum)
pv0.ix['bar','x'] #returns result
pv0.ix['bar','w'] #key error though i would like it to return all Nan's
pv0.index #returns
[(bar, x), (foo, w), (foo, x)]
As long as there is at least one entry in column 'C' as in the case of foo,x (it only has a value for 'z' in the 'C' column) it will return NaN for the other column values of 'C' not present for foo,x (e.g. 'a','y')
What I would like would be to have all multiindex combinations, even those that have no data for all column values.
pv0.index #I would like it to return
[(bar, w), (bar, x), (foo, w), (foo, x)]
I can wrap the .ix commands in try/except blocks, but is there a way that pandas can fill this in automatically?
In order to check missing values in Pandas DataFrame, we use a function isnull() and notnull(). Both function help in checking whether a value is NaN or not. These function can also be used in Pandas Series in order to find null values in a series.
Pandas DataFrame fillna() MethodThe fillna() method replaces the NULL values with a specified value. The fillna() method returns a new DataFrame object unless the inplace parameter is set to True , in that case the fillna() method does the replacing in the original DataFrame instead.
What is the difference between the pivot_table and the groupby? The groupby method is generally enough for two-dimensional operations, but pivot_table is used for multi-dimensional grouping operations.
You can use reindex() method:
>>> df1 = pd.pivot_table(df, rows=['A','B'], cols='C', aggfunc=np.sum)
>>> df1
D
C a y z
A B
bar x 0.161702 NaN NaN
foo w 0.749007 0.85552 NaN
x NaN NaN 0.458701
>>> index = list(iter.product(df['A'].unique(), df['B'].unique()))
>>> df1.reindex(index)
D
C a y z
foo w 0.749007 0.85552 NaN
x NaN NaN 0.458701
bar w NaN NaN NaN
x 0.161702 NaN NaN
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With