Concat DataFrame Reindexing only valid with uniquely valued Index objects

Tags:

I am trying to concat the following dataframes:

df1

                                price   side timestamp timestamp            2016-01-04 00:01:15.631331072   0.7286  2   1451865675631331 2016-01-04 00:01:15.631399936   0.7286  2   1451865675631400 2016-01-04 00:01:15.631860992   0.7286  2   1451865675631861 2016-01-04 00:01:15.631866112   0.7286  2   1451865675631866

and:

df2

                                bid     bid_size offer  offer_size timestamp                2016-01-04 00:00:31.331441920   0.7284  4000000 0.7285  1000000 2016-01-04 00:00:53.631324928   0.7284  4000000 0.7290  4000000 2016-01-04 00:01:03.131234048   0.7284  5000000 0.7286  4000000 2016-01-04 00:01:12.131444992   0.7285  1000000 0.7286  4000000 2016-01-04 00:01:15.631364096   0.7285  4000000 0.7290  4000000

With

 data = pd.concat([df1,df2], axis=1)

But I get the follwing output:

InvalidIndexError                         Traceback (most recent call last) <ipython-input-38-2e88458f01d7> in <module>() ----> 1 data = pd.concat([df1,df2], axis=1)       2 data = data.fillna(method='pad')       3 data = data.fillna(method='bfill')       4 data['timestamp'] =  data.index.values#converting to datetime       5 data['timestamp'] = pd.to_datetime(data['timestamp'])#converting to datetime  /usr/local/lib/python2.7/site-packages/pandas/tools/merge.pyc in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, copy)     810                        keys=keys, levels=levels, names=names,     811                        verify_integrity=verify_integrity, --> 812                        copy=copy)     813     return op.get_result()     814   /usr/local/lib/python2.7/site-packages/pandas/tools/merge.pyc in __init__(self, objs, axis, join, join_axes, keys, levels, names, ignore_index, verify_integrity, copy)     947         self.copy = copy     948  --> 949         self.new_axes = self._get_new_axes()     950      951     def get_result(self):  /usr/local/lib/python2.7/site-packages/pandas/tools/merge.pyc in _get_new_axes(self)    1013                 if i == self.axis:    1014                     continue -> 1015                 new_axes[i] = self._get_comb_axis(i)    1016         else:    1017             if len(self.join_axes) != ndim - 1:  /usr/local/lib/python2.7/site-packages/pandas/tools/merge.pyc in _get_comb_axis(self, i)    1039                 raise TypeError("Cannot concatenate list of %s" % types)    1040  -> 1041         return _get_combined_index(all_indexes, intersect=self.intersect)    1042     1043     def _get_concat_axis(self):  /usr/local/lib/python2.7/site-packages/pandas/core/index.pyc in _get_combined_index(indexes, intersect)    6120             index = index.intersection(other)    6121         return index -> 6122     union = _union_indexes(indexes)    6123     return _ensure_index(union)    6124   /usr/local/lib/python2.7/site-packages/pandas/core/index.pyc in _union_indexes(indexes)    6149     6150         if hasattr(result, 'union_many'): -> 6151             return result.union_many(indexes[1:])    6152         else:    6153             for other in indexes[1:]:  /usr/local/lib/python2.7/site-packages/pandas/tseries/index.pyc in union_many(self, others)     959             else:     960                 tz = this.tz --> 961                 this = Index.union(this, other)     962                 if isinstance(this, DatetimeIndex):     963                     this.tz = tz  /usr/local/lib/python2.7/site-packages/pandas/core/index.pyc in union(self, other)    1553                 result.extend([x for x in other._values if x not in value_set])    1554         else: -> 1555             indexer = self.get_indexer(other)    1556             indexer, = (indexer == -1).nonzero()    1557   /usr/local/lib/python2.7/site-packages/pandas/core/index.pyc in get_indexer(self, target, method, limit, tolerance)    1890     1891         if not self.is_unique: -> 1892             raise InvalidIndexError('Reindexing only valid with uniquely'    1893                                     ' valued Index objects')    1894   InvalidIndexError: Reindexing only valid with uniquely valued Index objects

I have removed additional columns and removed duplicates and NA where there could be a conflict - but I simply do not know what's wrong.

755

asked Jan 29 '16 12:01

noidea

2 Answers

You can mitigate this error without having to change your data or remove duplicates. Just create a new index with DataFrame.reset_index:

df = df.reset_index()

The old index is kept as a column in your dataframe, but if you don't need it you can do:

df = df.reset_index(drop=True)

Some prefer:

df.reset_index(inplace=True, drop=True)

180

answered Sep 17 '22 13:09

Nicholas Morley

pd.concat requires that the indices be unique. To remove rows with duplicate indices, use

df = df.loc[~df.index.duplicated(keep='first')]

import pandas as pd from pandas import Timestamp  df1 = pd.DataFrame(     {'price': [0.7286, 0.7286, 0.7286, 0.7286],      'side': [2, 2, 2, 2],      'timestamp': [1451865675631331, 1451865675631400,                   1451865675631861, 1451865675631866]},     index=pd.DatetimeIndex(['2000-1-1', '2000-1-1', '2001-1-1', '2002-1-1']))   df2 = pd.DataFrame(     {'bid': [0.7284, 0.7284, 0.7284, 0.7285, 0.7285],      'bid_size': [4000000, 4000000, 5000000, 1000000, 4000000],      'offer': [0.7285, 0.729, 0.7286, 0.7286, 0.729],      'offer_size': [1000000, 4000000, 4000000, 4000000, 4000000]},     index=pd.DatetimeIndex(['2000-1-1', '2001-1-1', '2002-1-1', '2003-1-1', '2004-1-1']))   df1 = df1.loc[~df1.index.duplicated(keep='first')] #              price  side         timestamp # 2000-01-01  0.7286     2  1451865675631331 # 2001-01-01  0.7286     2  1451865675631861 # 2002-01-01  0.7286     2  1451865675631866  df2 = df2.loc[~df2.index.duplicated(keep='first')] #                bid  bid_size   offer  offer_size # 2000-01-01  0.7284   4000000  0.7285     1000000 # 2001-01-01  0.7284   4000000  0.7290     4000000 # 2002-01-01  0.7284   5000000  0.7286     4000000 # 2003-01-01  0.7285   1000000  0.7286     4000000 # 2004-01-01  0.7285   4000000  0.7290     4000000  result = pd.concat([df1, df2], axis=0) print(result)                bid  bid_size   offer  offer_size   price  side     timestamp 2000-01-01     NaN       NaN     NaN         NaN  0.7286     2  1.451866e+15 2001-01-01     NaN       NaN     NaN         NaN  0.7286     2  1.451866e+15 2002-01-01     NaN       NaN     NaN         NaN  0.7286     2  1.451866e+15 2000-01-01  0.7284   4000000  0.7285     1000000     NaN   NaN           NaN 2001-01-01  0.7284   4000000  0.7290     4000000     NaN   NaN           NaN 2002-01-01  0.7284   5000000  0.7286     4000000     NaN   NaN           NaN 2003-01-01  0.7285   1000000  0.7286     4000000     NaN   NaN           NaN 2004-01-01  0.7285   4000000  0.7290     4000000     NaN   NaN           NaN

Note there is also pd.join, which can join DataFrames based on their indices, and handle non-unique indices based on the how parameter. Rows with duplicate index are not removed.

In [94]: df1.join(df2) Out[94]:               price  side         timestamp     bid  bid_size   offer  \ 2000-01-01  0.7286     2  1451865675631331  0.7284   4000000  0.7285    2000-01-01  0.7286     2  1451865675631400  0.7284   4000000  0.7285    2001-01-01  0.7286     2  1451865675631861  0.7284   4000000  0.7290    2002-01-01  0.7286     2  1451865675631866  0.7284   5000000  0.7286                 offer_size   2000-01-01     1000000   2000-01-01     1000000   2001-01-01     4000000   2002-01-01     4000000    In [95]: df1.join(df2, how='outer') Out[95]:               price  side     timestamp     bid  bid_size   offer  offer_size 2000-01-01  0.7286     2  1.451866e+15  0.7284   4000000  0.7285     1000000 2000-01-01  0.7286     2  1.451866e+15  0.7284   4000000  0.7285     1000000 2001-01-01  0.7286     2  1.451866e+15  0.7284   4000000  0.7290     4000000 2002-01-01  0.7286     2  1.451866e+15  0.7284   5000000  0.7286     4000000 2003-01-01     NaN   NaN           NaN  0.7285   1000000  0.7286     4000000 2004-01-01     NaN   NaN           NaN  0.7285   4000000  0.7290     4000000

answered Sep 19 '22 13:09

unutbu

Related questions
                            
                                Python 2 CSV writer produces wrong line terminator on Windows
                            
                                Python - temporarily modify the current process's environment
                            
                                Seeking from end of file throwing unsupported exception
                            
                                Why can't PySpark find py4j.java_gateway?
                            
                                Pythonic way to insert every 2 elements in a string
                            
                                how do I launch IDLE, the development environment for Python, on Mac OS 10.7?
                            
                                Counting Cars OpenCV + Python Issue
                            
                                Python - Using regex to find multiple matches and print them out [duplicate]
                            
                                python xlrd unsupported format, or corrupt file.
                            
                                Get virtualenv's bin folder path from script
                            
                                Keep same dummy variable in training and testing data
                            
                                Good practices in writing MATLAB code? [closed]
                            
                                Using Variables for Class Names in Python?
                            
                                Count indexes using "for" in Python
                            
                                Return number of files in directory and subdirectory
                            
                                In Python, what is the fastest algorithm for removing duplicates from a list so that all elements are unique *while preserving order*? [duplicate]
                            
                                Python, print delimited list
                            
                                How do I use url_for if my method has multiple route annotations?
                            
                                pandas DataFrame output end of csv
                            
                                Tensorflow._api.v2.train has no attribute 'AdamOptimizer'

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Concat DataFrame Reindexing only valid with uniquely valued Index objects

Tags:

python

pandas

numpy

noidea

People also ask

2 Answers

Nicholas Morley

unutbu

Recent Activity

Donate For Us