I am trying to unstack a multi-index with pandas and I am keep getting: <pre class="prettyprint"><code>ValueError: Index contains duplicate entries, cannot reshape </code></pre> Given a dataset with four columns: <ul> <li>id (string)</li> <li>date (string)</li> <li>location (string)</li> <li>value (float)</li> </ul> I first set a three-level multi-index: <pre class="prettyprint"><code>In [37]: e.set_index(['id', 'date', 'location'], inplace=True) In [38]: e Out[38]: value id date location id1 2014-12-12 loc1 16.86 2014-12-11 loc1 17.18 2014-12-10 loc1 17.03 2014-12-09 loc1 17.28 </code></pre> Then I try to unstack the location: <pre class="prettyprint"><code>In [39]: e.unstack('location') --------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-39-bc1e237a0ed7> in <module>() ----> 1 e.unstack('location') ... C:\Anaconda\envs\sandbox\lib\site-packages\pandas\core\reshape.pyc in _make_selectors(self) 143 144 if mask.sum() < len(self.index): --> 145 raise ValueError('Index contains duplicate entries, ' 146 'cannot reshape') 147 ValueError: Index contains duplicate entries, cannot reshape </code></pre> What is going on here?

There's a far more simpler solution to tackle this. The reason why you get <code>ValueError: Index contains duplicate entries, cannot reshape</code> is because, once you unstack "<code>Location</code>", then the remaining index columns "<code>id</code>" and "<code>date</code>" combinations are no longer unique. You can avoid this by retaining the default index column (row #) and while setting the index using "<code>id</code>", "<code>date</code>" and "<code>location</code>", add it in "<code>append</code>" mode instead of the default overwrite mode. So use, <pre class="prettyprint"><code>e.set_index(['id', 'date', 'location'], append=True) </code></pre> Once this is done, your index columns will still have the default index along with the set indexes. And <code>unstack</code> will work. Let me know how it works out.

Pandas unstack problems: ValueError: Index contains duplicate entries, cannot reshape

Tags:

python

pandas

I am trying to unstack a multi-index with pandas and I am keep getting:

ValueError: Index contains duplicate entries, cannot reshape

Given a dataset with four columns:

id (string)
date (string)
location (string)
value (float)

I first set a three-level multi-index:

In [37]: e.set_index(['id', 'date', 'location'], inplace=True)

In [38]: e
Out[38]: 
                                    value
id           date       location       
id1          2014-12-12 loc1        16.86
             2014-12-11 loc1        17.18
             2014-12-10 loc1        17.03
             2014-12-09 loc1        17.28

Then I try to unstack the location:

In [39]: e.unstack('location')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-39-bc1e237a0ed7> in <module>()
----> 1 e.unstack('location')
...
C:\Anaconda\envs\sandbox\lib\site-packages\pandas\core\reshape.pyc in _make_selectors(self)
    143 
    144         if mask.sum() < len(self.index):
--> 145             raise ValueError('Index contains duplicate entries, '
    146                              'cannot reshape')
    147 

ValueError: Index contains duplicate entries, cannot reshape

What is going on here?

222

asked Feb 21 '15 20:02

ARF

2 Answers

Here's an example DataFrame which show this, it has duplicate values with the same index. The question is, do you want to aggregate these or keep them as multiple rows?

In [11]: df
Out[11]:
   0  1  2      3
0  1  2  a  16.86
1  1  2  a  17.18
2  1  4  a  17.03
3  2  5  b  17.28

In [12]: df.pivot_table(values=3, index=[0, 1], columns=2, aggfunc='mean')  # desired?
Out[12]:
2        a      b
0 1
1 2  17.02    NaN
  4  17.03    NaN
2 5    NaN  17.28

In [13]: df1 = df.set_index([0, 1, 2])

In [14]: df1
Out[14]:
           3
0 1 2
1 2 a  16.86
    a  17.18
  4 a  17.03
2 5 b  17.28

In [15]: df1.unstack(2)
ValueError: Index contains duplicate entries, cannot reshape

One solution is to reset_index (and get back to df) and use pivot_table.

In [16]: df1.reset_index().pivot_table(values=3, index=[0, 1], columns=2, aggfunc='mean')
Out[16]:
2        a      b
0 1
1 2  17.02    NaN
  4  17.03    NaN
2 5    NaN  17.28

Another option (if you don't want to aggregate) is to append a dummy level, unstack it, then drop the dummy level...

114

answered Oct 19 '22 23:10

Andy Hayden

There's a far more simpler solution to tackle this.

The reason why you get ValueError: Index contains duplicate entries, cannot reshape is because, once you unstack "Location", then the remaining index columns "id" and "date" combinations are no longer unique.

You can avoid this by retaining the default index column (row #) and while setting the index using "id", "date" and "location", add it in "append" mode instead of the default overwrite mode.

So use,

e.set_index(['id', 'date', 'location'], append=True)

Once this is done, your index columns will still have the default index along with the set indexes. And unstack will work.

Let me know how it works out.

answered Oct 19 '22 23:10

HVS

Related questions
                            
                                Find the date for the first Monday after a given date
                            
                                Get all text inside a tag in lxml
                            
                                How can I convert radians to degrees with Python?
                            
                                How can I denote unused function arguments?
                            
                                inverting image in Python with OpenCV
                            
                                Debugging the error "gcc: error: x86_64-linux-gnu-gcc: No such file or directory"
                            
                                Find Monday's date with Python
                            
                                SSL: CERTIFICATE_VERIFY_FAILED with Python3
                            
                                Python urllib2, basic HTTP authentication, and tr.im
                            
                                Scikit-learn: How to obtain True Positive, True Negative, False Positive and False Negative
                            
                                Intersecting two dictionaries
                            
                                Memory error when using pandas read_csv
                            
                                When and how to use Tornado? When is it useless?
                            
                                matplotlib: can I create AxesSubplot objects, then add them to a Figure instance?
                            
                                Python remove set from set
                            
                                Pandas timeseries plot setting x-axis major and minor ticks and labels
                            
                                how to convert 2d list to 2d numpy array?
                            
                                Mocking Functions Using Python Mock
                            
                                Is 'file' a keyword in python?
                            
                                regexes: How to access multiple matches of a group? [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With