Pandas Append Behavior

Question

This is something that I'm not quite sure on from the docs.

Say I have two dataframes, with overlapping data.

DF1 has a DateTimeIndex starting at 07:00:00 and ending at 09:30:00.

DF2 has a DateTimeIndex starting at 07:00:00 and ending at 11:30:00.

DF2 is an updated DF1, but some rows may be added as compared to DF1 before the end time of DF1. So DF2 might have 200 rows added from 9:20:00-9:30:00 on the update, and then everything after 09:30:00 also is new of course.

If I use:

DF1.append(DF2)

Will I get all the rows from DF2 that are new? Or does pandas only go in and take the rows from after the end of DF1? Beyond that, DF2 additions could actually be at the same Time as a DF1 row but it would have different contents. Would pandas handle this as well?

If pandas doesn't handle this, what is the best way to do this myself?

In [489]: df
Out[489]:
                     Row1  Row3
2013-11-05 08:00:00     2   NaN
2013-11-05 09:00:00     4   NaN
2013-11-05 09:06:00     6     5

In [490]: df2
Out[490]:
                     Row1  Row3
2013-11-05 08:00:00     2   NaN
2013-11-05 09:00:00     5   NaN
2013-11-05 09:09:00     6     5

In [491]: df.append(df2)
Out[491]:
                     Row1  Row3
2013-11-05 08:00:00     2   NaN
2013-11-05 09:00:00     4   NaN
2013-11-05 09:06:00     6     5
2013-11-05 08:00:00     2   NaN
2013-11-05 09:00:00     5   NaN
2013-11-05 09:09:00     6     5

I want df.append(df2) in this case to be:

In [491]: df.append(df2)
Out[491]:
                     Row1  Row3
2013-11-05 08:00:00     2   NaN
2013-11-05 09:00:00     4   NaN
2013-11-05 09:06:00     6     5
<strike>2013-11-05 08:00:00     2   NaN</strike>
2013-11-05 09:00:00     5   NaN
2013-11-05 09:09:00     6     5

edit2:

I was previously doing this:

last = df.ix[-1].name
to_append = df2[last:]
new_df = df.append(to_append)

This unfortunately cuts out the rows that are new but before the timestamp of the last row of my previous DataFrame

alko · Accepted Answer

Append is similar to one of python list, you will get both dataframes "stacked" together. Whether a ValueError is raised in case of indexes with duplicates is controlled by verify_integrity param to append, defaulting to False.

>>> df = pd.DataFrame.from_dict({'col':{'row': 1}})
>>> df
     col
row    1
>>> df.append(df).index
Index([u'row', u'row'], dtype=object)
>>> df.append(df)
     col
row    1
row    1

>>> df.append(df, verify_integrity=True)
Traceback (most recent call last):
   ...
ValueError: Indexes have overlapping values: ['row']

for merging with replacement use combine_first:

>>> mdf = pd.DataFrame.from_dict({'col':{'row': 2, 'new':3}})
>>> df.combine_first(mdf) # values from df overwrite those of mdf
     col
new    3
row    1
>>> mdf.combine_first(df) # values from mdf overwrite those of df
     col
new    3
row    2

For the reference, here is extensive guide on different merge and join means of dataframes

UPDATE

Followup: as you wish behaviour similar to SQL union, one approach migth be:

>>> df = pd.DataFrame.from_dict({'col':{'row': 1, 'new': 3}})
>>> mdf.append(df).drop_duplicates()
     col
new    3
row    2
row    1

Or if you want to take index in account,

>>> mdf['index'] = mdf.index
>>> df['index'] = df.index
>>> union = mdf.append(df).drop_duplicates()
>>> del union['index']
>>> union
     col
new    3
row    2
row    1

Pandas Append Behavior

Tags:

python

pandas

user1610719

1 Answers

alko

Recent Activity

Donate For Us

Pandas Append Behavior

Tags:

python

pandas

user1610719

1 Answers

alko

Related questions

Recent Activity

Donate For Us