Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merging with empty DataFrame

I'm trying to merge a dataframe (df1) with another dataframe (df2) for which df2 can potentially be empty. The merge condition is df1.index=df2.z (df1 is never empty), but I'm getting the following error.

Is there any way to get this working?

In [31]:
import pandas as pd
In [32]:
df1 = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6], 'c': [1, 2, 3]})
df2 = pd.DataFrame({'x':[], 'y':[], 'z':[]})
dfm = pd.merge(df1, df2, how='outer', left_index=True, right_on='z')
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-34-4e9943198dae> in <module>()
----> 1 dfmb = pd.merge(df1, df2, how='outer', left_index=True, right_on='z')

/usr/local/lib/python2.7/dist-packages/pandas/tools/merge.pyc in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy)
     37                          right_index=right_index, sort=sort, suffixes=suffixes,
     38                          copy=copy)
---> 39     return op.get_result()
     40 if __debug__:
     41     merge.__doc__ = _merge_doc % '\nleft : DataFrame'

/usr/local/lib/python2.7/dist-packages/pandas/tools/merge.pyc in get_result(self)
    185 
    186     def get_result(self):
--> 187         join_index, left_indexer, right_indexer = self._get_join_info()
    188 
    189         ldata, rdata = self.left._data, self.right._data

/usr/local/lib/python2.7/dist-packages/pandas/tools/merge.pyc in _get_join_info(self)
    277                 join_index = self.left.index.take(left_indexer)
    278             elif self.left_index:
--> 279                 join_index = self.right.index.take(right_indexer)
    280             else:
    281                 join_index = Index(np.arange(len(left_indexer)))

/usr/local/lib/python2.7/dist-packages/pandas/core/index.pyc in take(self, indexer, axis)
    981 
    982         indexer = com._ensure_platform_int(indexer)
--> 983         taken = np.array(self).take(indexer)
    984 
    985         # by definition cannot propogate freq

IndexError: cannot do a non-empty take from an empty axes.
like image 306
orange Avatar asked Mar 03 '15 00:03

orange


People also ask

How do you merge an empty Dataframe with another Dataframe in Python?

merge(df1, df2, how='outer', left_index=True, right_on='z') /usr/local/lib/python2. 7/dist-packages/pandas/tools/merge. pyc in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy) 37 right_index=right_index, sort=sort, suffixes=suffixes, 38 copy=copy) ---> 39 return op.

Can you append to empty Dataframe?

Append Data to an Empty Pandas Dataframe loc , we can also use the . append() method to add rows. The . append() method works by, well, appending a dataframe to another dataframe.

What does pandas Dataframe merge do?

Pandas DataFrame merge() Method The merge() method updates the content of two DataFrame by merging them together, using the specified method(s). Use the parameters to control which values to keep and which to replace.

How do I merge data frames?

The concat() function can be used to concatenate two Dataframes by adding the rows of one to the other. The merge() function is equivalent to the SQL JOIN clause. 'left', 'right' and 'inner' joins are all possible.


2 Answers

try:
    dfm = pd.merge(df1, df2, how='outer', left_index=True, right_on='z')
except IndexError:
    dfm = df1 if not df1.empty else df2

might be sufficient for your needs

like image 51
Joran Beasley Avatar answered Oct 01 '22 06:10

Joran Beasley


Another alternative, similar to Joran's:

try:
    dfm = pd.merge(df1, df2, how='outer', left_index=True, right_on='z')
except IndexError:
    dfm = df1.reindex_axis(df1.columns.union(df2.columns), axis=1)

I'm not sure which is clearer but both the following work:

In [11]: df1.reindex_axis(df1.columns.union(df2.columns), axis=1)
Out[11]:
   a  b  c   x   y   z
0  1  4  1 NaN NaN NaN
1  2  5  2 NaN NaN NaN
2  3  6  3 NaN NaN NaN

In [12]: df1.loc[:, df1.columns.union(df2.columns)]
Out[12]:
   a  b  c   x   y   z
0  1  4  1 NaN NaN NaN
1  2  5  2 NaN NaN NaN
2  3  6  3 NaN NaN NaN

(I prefer the former.)

like image 29
Andy Hayden Avatar answered Oct 01 '22 07:10

Andy Hayden