Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Concat DataFrame Reindexing only valid with uniquely valued Index objects

I am trying to concat the following dataframes:

df1

                                price   side timestamp timestamp            2016-01-04 00:01:15.631331072   0.7286  2   1451865675631331 2016-01-04 00:01:15.631399936   0.7286  2   1451865675631400 2016-01-04 00:01:15.631860992   0.7286  2   1451865675631861 2016-01-04 00:01:15.631866112   0.7286  2   1451865675631866 

and:

df2

                                bid     bid_size offer  offer_size timestamp                2016-01-04 00:00:31.331441920   0.7284  4000000 0.7285  1000000 2016-01-04 00:00:53.631324928   0.7284  4000000 0.7290  4000000 2016-01-04 00:01:03.131234048   0.7284  5000000 0.7286  4000000 2016-01-04 00:01:12.131444992   0.7285  1000000 0.7286  4000000 2016-01-04 00:01:15.631364096   0.7285  4000000 0.7290  4000000 

With

 data = pd.concat([df1,df2], axis=1)   

But I get the follwing output:

InvalidIndexError                         Traceback (most recent call last) <ipython-input-38-2e88458f01d7> in <module>() ----> 1 data = pd.concat([df1,df2], axis=1)       2 data = data.fillna(method='pad')       3 data = data.fillna(method='bfill')       4 data['timestamp'] =  data.index.values#converting to datetime       5 data['timestamp'] = pd.to_datetime(data['timestamp'])#converting to datetime  /usr/local/lib/python2.7/site-packages/pandas/tools/merge.pyc in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, copy)     810                        keys=keys, levels=levels, names=names,     811                        verify_integrity=verify_integrity, --> 812                        copy=copy)     813     return op.get_result()     814   /usr/local/lib/python2.7/site-packages/pandas/tools/merge.pyc in __init__(self, objs, axis, join, join_axes, keys, levels, names, ignore_index, verify_integrity, copy)     947         self.copy = copy     948  --> 949         self.new_axes = self._get_new_axes()     950      951     def get_result(self):  /usr/local/lib/python2.7/site-packages/pandas/tools/merge.pyc in _get_new_axes(self)    1013                 if i == self.axis:    1014                     continue -> 1015                 new_axes[i] = self._get_comb_axis(i)    1016         else:    1017             if len(self.join_axes) != ndim - 1:  /usr/local/lib/python2.7/site-packages/pandas/tools/merge.pyc in _get_comb_axis(self, i)    1039                 raise TypeError("Cannot concatenate list of %s" % types)    1040  -> 1041         return _get_combined_index(all_indexes, intersect=self.intersect)    1042     1043     def _get_concat_axis(self):  /usr/local/lib/python2.7/site-packages/pandas/core/index.pyc in _get_combined_index(indexes, intersect)    6120             index = index.intersection(other)    6121         return index -> 6122     union = _union_indexes(indexes)    6123     return _ensure_index(union)    6124   /usr/local/lib/python2.7/site-packages/pandas/core/index.pyc in _union_indexes(indexes)    6149     6150         if hasattr(result, 'union_many'): -> 6151             return result.union_many(indexes[1:])    6152         else:    6153             for other in indexes[1:]:  /usr/local/lib/python2.7/site-packages/pandas/tseries/index.pyc in union_many(self, others)     959             else:     960                 tz = this.tz --> 961                 this = Index.union(this, other)     962                 if isinstance(this, DatetimeIndex):     963                     this.tz = tz  /usr/local/lib/python2.7/site-packages/pandas/core/index.pyc in union(self, other)    1553                 result.extend([x for x in other._values if x not in value_set])    1554         else: -> 1555             indexer = self.get_indexer(other)    1556             indexer, = (indexer == -1).nonzero()    1557   /usr/local/lib/python2.7/site-packages/pandas/core/index.pyc in get_indexer(self, target, method, limit, tolerance)    1890     1891         if not self.is_unique: -> 1892             raise InvalidIndexError('Reindexing only valid with uniquely'    1893                                     ' valued Index objects')    1894   InvalidIndexError: Reindexing only valid with uniquely valued Index objects   

I have removed additional columns and removed duplicates and NA where there could be a conflict - but I simply do not know what's wrong.

like image 755
noidea Avatar asked Jan 29 '16 12:01

noidea


People also ask

Does pandas concat use index?

Dealing with index and axis And by default, it is concatenating vertically along the axis 0 and preserving all existing indices. If you want the concatenation to ignore existing indices, you can set the argument ignore_index=True . Then, the resulting DataFrame index will be labeled with 0 , …, n-1 .

What is Reindexing in pandas?

Pandas DataFrame reindex() Method The reindex() method allows you to change the row indexes, and the columns labels. ;] Note: The values are set to NaN if the new index is not the same as the old.

How do I concatenate Dataframes in pandas?

We'll pass two dataframes to pd. concat() method in the form of a list and mention in which axis you want to concat, i.e. axis=0 to concat along rows, axis=1 to concat along columns.

How do I add two Dataframes in Python?

append() function is used to append rows of other dataframe to the end of the given dataframe, returning a new dataframe object. Columns not in the original dataframes are added as new columns and the new cells are populated with NaN value. Parameters: other : DataFrame or Series/dict-like object, or list of these.


2 Answers

You can mitigate this error without having to change your data or remove duplicates. Just create a new index with DataFrame.reset_index:

df = df.reset_index() 

The old index is kept as a column in your dataframe, but if you don't need it you can do:

df = df.reset_index(drop=True) 

Some prefer:

df.reset_index(inplace=True, drop=True) 
like image 180
Nicholas Morley Avatar answered Sep 17 '22 13:09

Nicholas Morley


pd.concat requires that the indices be unique. To remove rows with duplicate indices, use

df = df.loc[~df.index.duplicated(keep='first')] 

import pandas as pd from pandas import Timestamp  df1 = pd.DataFrame(     {'price': [0.7286, 0.7286, 0.7286, 0.7286],      'side': [2, 2, 2, 2],      'timestamp': [1451865675631331, 1451865675631400,                   1451865675631861, 1451865675631866]},     index=pd.DatetimeIndex(['2000-1-1', '2000-1-1', '2001-1-1', '2002-1-1']))   df2 = pd.DataFrame(     {'bid': [0.7284, 0.7284, 0.7284, 0.7285, 0.7285],      'bid_size': [4000000, 4000000, 5000000, 1000000, 4000000],      'offer': [0.7285, 0.729, 0.7286, 0.7286, 0.729],      'offer_size': [1000000, 4000000, 4000000, 4000000, 4000000]},     index=pd.DatetimeIndex(['2000-1-1', '2001-1-1', '2002-1-1', '2003-1-1', '2004-1-1']))   df1 = df1.loc[~df1.index.duplicated(keep='first')] #              price  side         timestamp # 2000-01-01  0.7286     2  1451865675631331 # 2001-01-01  0.7286     2  1451865675631861 # 2002-01-01  0.7286     2  1451865675631866  df2 = df2.loc[~df2.index.duplicated(keep='first')] #                bid  bid_size   offer  offer_size # 2000-01-01  0.7284   4000000  0.7285     1000000 # 2001-01-01  0.7284   4000000  0.7290     4000000 # 2002-01-01  0.7284   5000000  0.7286     4000000 # 2003-01-01  0.7285   1000000  0.7286     4000000 # 2004-01-01  0.7285   4000000  0.7290     4000000  result = pd.concat([df1, df2], axis=0) print(result)                bid  bid_size   offer  offer_size   price  side     timestamp 2000-01-01     NaN       NaN     NaN         NaN  0.7286     2  1.451866e+15 2001-01-01     NaN       NaN     NaN         NaN  0.7286     2  1.451866e+15 2002-01-01     NaN       NaN     NaN         NaN  0.7286     2  1.451866e+15 2000-01-01  0.7284   4000000  0.7285     1000000     NaN   NaN           NaN 2001-01-01  0.7284   4000000  0.7290     4000000     NaN   NaN           NaN 2002-01-01  0.7284   5000000  0.7286     4000000     NaN   NaN           NaN 2003-01-01  0.7285   1000000  0.7286     4000000     NaN   NaN           NaN 2004-01-01  0.7285   4000000  0.7290     4000000     NaN   NaN           NaN 

Note there is also pd.join, which can join DataFrames based on their indices, and handle non-unique indices based on the how parameter. Rows with duplicate index are not removed.

In [94]: df1.join(df2) Out[94]:               price  side         timestamp     bid  bid_size   offer  \ 2000-01-01  0.7286     2  1451865675631331  0.7284   4000000  0.7285    2000-01-01  0.7286     2  1451865675631400  0.7284   4000000  0.7285    2001-01-01  0.7286     2  1451865675631861  0.7284   4000000  0.7290    2002-01-01  0.7286     2  1451865675631866  0.7284   5000000  0.7286                 offer_size   2000-01-01     1000000   2000-01-01     1000000   2001-01-01     4000000   2002-01-01     4000000    In [95]: df1.join(df2, how='outer') Out[95]:               price  side     timestamp     bid  bid_size   offer  offer_size 2000-01-01  0.7286     2  1.451866e+15  0.7284   4000000  0.7285     1000000 2000-01-01  0.7286     2  1.451866e+15  0.7284   4000000  0.7285     1000000 2001-01-01  0.7286     2  1.451866e+15  0.7284   4000000  0.7290     4000000 2002-01-01  0.7286     2  1.451866e+15  0.7284   5000000  0.7286     4000000 2003-01-01     NaN   NaN           NaN  0.7285   1000000  0.7286     4000000 2004-01-01     NaN   NaN           NaN  0.7285   4000000  0.7290     4000000 
like image 38
unutbu Avatar answered Sep 19 '22 13:09

unutbu