Here is a simple DataFrame: <pre class="prettyprint"><code>> df = pd.DataFrame({'a': ['a1', 'a2', 'a3'], 'b': ['optional1', None, 'optional3'], 'c': ['c1', 'c2', 'c3'], 'd': [1, 2, 3]}) > df a b c d 0 a1 optional1 c1 1 1 a2 None c2 2 2 a3 optional3 c3 3 </code></pre> <h3>Pivot method 1</h3> The data can be pivoted to this: <pre class="prettyprint"><code>> df.pivot_table(index=['a','b'], columns='c') d c c1 c3 a b a1 optional1 1.0 NaN a3 optional3 NaN 3.0 </code></pre> Downside: data in the 2nd row is lost because <code>df['b'][1] == None</code>. <h3>Pivot method 2</h3> <pre class="prettyprint"><code>> df.pivot_table(index=['a'], columns='c') d c c1 c2 c3 a a1 1.0 NaN NaN a2 NaN 2.0 NaN a3 NaN NaN 3.0 </code></pre> Downside: column <code>b</code> is lost. How can the two methods be combined so that columns <code>b</code> and the 2nd row are kept like so: <pre class="prettyprint"><code> d c c1 c2 c3 a b a1 optional1 1.0 NaN NaN a2 None NaN 2.0 NaN a3 optional3 NaN NaN 3.0 </code></pre> More generally: How can information from a row be retained during pivoting if a key has <code>NaN</code> value?

Use <code>set_index</code> and <code>unstack</code> to perform the pivot: <pre class="prettyprint"><code>df = df.set_index(['a', 'b', 'c']).unstack('c') </code></pre> This is essentially what pandas does under the hood for <code>pivot</code>. The <code>stack</code> and <code>unstack</code> methods are closely related to <code>pivot</code>, and can generally be used to perform pivot-like operations that don't quite conform with the built-in pivot functions. The resulting output: <pre class="prettyprint"><code> d c c1 c2 c3 a b a1 optional1 1.0 NaN NaN a2 NaN NaN 2.0 NaN a3 optional3 NaN NaN 3.0 </code></pre>

NaN values in pivot_table index causes loss of data

Tags:

python

pandas

dataframe

pivot

Here is a simple DataFrame:

> df = pd.DataFrame({'a': ['a1', 'a2', 'a3'],
                     'b': ['optional1', None, 'optional3'],
                     'c': ['c1', 'c2', 'c3'],
                     'd': [1, 2, 3]})
> df

    a          b   c  d
0  a1  optional1  c1  1
1  a2       None  c2  2
2  a3  optional3  c3  3

Pivot method 1

The data can be pivoted to this:

> df.pivot_table(index=['a','b'], columns='c')
                d     
c              c1   c3
a  b                  
a1 optional1  1.0  NaN
a3 optional3  NaN  3.0

Downside: data in the 2nd row is lost because df['b'][1] == None.

Pivot method 2

> df.pivot_table(index=['a'], columns='c')
      d          
c    c1   c2   c3
a                
a1  1.0  NaN  NaN
a2  NaN  2.0  NaN
a3  NaN  NaN  3.0

Downside: column b is lost.

How can the two methods be combined so that columns b and the 2nd row are kept like so:

                d     
c              c1   c2   c3
a  b                  
a1 optional1  1.0  NaN  NaN
a2      None  NaN  2.0  NaN
a3 optional3  NaN  NaN  3.0

More generally: How can information from a row be retained during pivoting if a key has NaN value?

721

asked Jan 24 '17 21:01

victorlin

1 Answers

Use set_index and unstack to perform the pivot:

df = df.set_index(['a', 'b', 'c']).unstack('c')

This is essentially what pandas does under the hood for pivot. The stack and unstack methods are closely related to pivot, and can generally be used to perform pivot-like operations that don't quite conform with the built-in pivot functions.

The resulting output:

                d          
c              c1   c2   c3
a  b                       
a1 optional1  1.0  NaN  NaN
a2 NaN        NaN  2.0  NaN
a3 optional3  NaN  NaN  3.0

129

answered Oct 11 '22 13:10

root

Related questions
                            
                                feather data storage library for python 'module' object has no attribute 'write_dataframe' error
                            
                                Second plot axis with different units on same data in matplotlib?
                            
                                Upload file to MS SharePoint using Python OneDrive SDK
                            
                                from Flask import Flask ImportError: No module named Flask
                            
                                Correct way to get allowed arguments from ArgumentParser
                            
                                How can I ensure that arguments have same type without listing the types explicitly?
                            
                                Pandas: Difference to previous value
                            
                                CountVectorizer and Out-Of-Vocabulary (OOV) tokens?
                            
                                Overlaying vertical lines onto a plot in Altair
                            
                                Transforming and Resampling a 3D volume with numpy/scipy
                            
                                python - increase efficiency of large-file search by readlines(size)
                            
                                Python - Black screen afther re-opening pygame application
                            
                                save a dependecy graph in python
                            
                                golang/python zlib difference
                            
                                Kill function after a given amount of time?
                            
                                psycopg2: How to know when cur.rowcount does not mean number of rows?
                            
                                Writing scikit-learn verbose log into an external file
                            
                                pyaudio-OSError: [Errno -9999] Unanticipated host error
                            
                                ESP8266 Micropython - connecting to University Wi-fi ( WPA2 Enterprise PEAP )
                            
                                Extract patches from 3D Matrix

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With