Setting DataFrame values with enlargement

Tags:

pandas

I have two DataFrames (with DatetimeIndex) and want to update the first frame (the older one) with data from the second frame (the newer one).

The new frame may contain more recent data for rows already contained in the the old frame. In this case, data in the old frame should be overwritten with data from the new frame. Also the newer frame may have more columns / rows, than the first one. In this case the old frame should be enlarged by the data in the new frame.

Pandas docs state, that

"The .loc/.ix/[] operations can perform enlargement when setting a non-existant key for that axis"

and

"a DataFrame can be enlarged on either axis via .loc"

However this doesn't seem to work and throws a KeyError. Example:

In [195]: df1
Out[195]: 
                     A  B  C
2015-07-09 12:00:00  1  1  1
2015-07-09 13:00:00  1  1  1
2015-07-09 14:00:00  1  1  1
2015-07-09 15:00:00  1  1  1

In [196]: df2
Out[196]: 
                     A  B  C  D
2015-07-09 14:00:00  2  2  2  2
2015-07-09 15:00:00  2  2  2  2
2015-07-09 16:00:00  2  2  2  2
2015-07-09 17:00:00  2  2  2  2

In [197]: df1.loc[df2.index] = df2
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-197-74e630e87cf8> in <module>()
----> 1 df1.loc[df2.index] = df2

/.../pandas/core/indexing.pyc in __setitem__(self, key, value)
    112 
    113     def __setitem__(self, key, value):
--> 114         indexer = self._get_setitem_indexer(key)
    115         self._setitem_with_indexer(indexer, value)
    116 

/.../pandas/core/indexing.pyc in _get_setitem_indexer(self, key)
    107 
    108         try:
--> 109             return self._convert_to_indexer(key, is_setter=True)
    110         except TypeError:
    111             raise IndexingError(key)

/.../pandas/core/indexing.pyc in _convert_to_indexer(self, obj, axis, is_setter)
   1110                 mask = check == -1
   1111                 if mask.any():
-> 1112                     raise KeyError('%s not in index' % objarr[mask])
   1113 
   1114                 return _values_from_object(indexer)

KeyError: "['2015-07-09T18:00:00.000000000+0200' '2015-07-09T19:00:00.000000000+0200'] not in index"

What is the best way (with respect to performance, as my real data is much larger) two achieve the desired updated and enlarged DataFrame. This is the result I would like to see:

                     A  B  C    D
2015-07-09 12:00:00  1  1  1  NaN
2015-07-09 13:00:00  1  1  1  NaN
2015-07-09 14:00:00  2  2  2    2
2015-07-09 15:00:00  2  2  2    2
2015-07-09 16:00:00  2  2  2    2
2015-07-09 17:00:00  2  2  2    2

724

asked Jul 09 '15 14:07

bmu

3 Answers

df2.combine_first(df1) (documentation) seems to serve your requirement; PFB code snippet & output

import pandas as pd

print 'pandas-version: ', pd.__version__

df1 = pd.DataFrame.from_records([('2015-07-09 12:00:00',1,1,1),
                                 ('2015-07-09 13:00:00',1,1,1),
                                 ('2015-07-09 14:00:00',1,1,1),
                                 ('2015-07-09 15:00:00',1,1,1)],
                                columns=['Dt', 'A', 'B', 'C']).set_index('Dt')
# print df1

df2 = pd.DataFrame.from_records([('2015-07-09 14:00:00',2,2,2,2),
                                 ('2015-07-09 15:00:00',2,2,2,2),
                                 ('2015-07-09 16:00:00',2,2,2,2),
                                 ('2015-07-09 17:00:00',2,2,2,2),],
                               columns=['Dt', 'A', 'B', 'C', 'D']).set_index('Dt')
res_combine1st = df2.combine_first(df1)
print res_combine1st

output

pandas-version:  0.15.2
                     A  B  C   D
Dt                              
2015-07-09 12:00:00  1  1  1 NaN
2015-07-09 13:00:00  1  1  1 NaN
2015-07-09 14:00:00  2  2  2   2
2015-07-09 15:00:00  2  2  2   2
2015-07-09 16:00:00  2  2  2   2
2015-07-09 17:00:00  2  2  2   2

answered Nov 15 '22 20:11

Joshua Baboo

You can use the combine function.

import pandas as pd

# your data
# ===========================================================
df1 = pd.DataFrame(np.ones(12).reshape(4,3), columns='A B C'.split(), index=pd.date_range('2015-07-09 12:00:00', periods=4, freq='H'))

df2 = pd.DataFrame(np.ones(16).reshape(4,4)*2, columns='A B C D'.split(), index=pd.date_range('2015-07-09 14:00:00', periods=4, freq='H'))

# processing
# =====================================================
# reindex to populate NaN
result = df2.reindex(np.union1d(df1.index, df2.index))

Out[248]: 
                      A   B   C   D
2015-07-09 12:00:00 NaN NaN NaN NaN
2015-07-09 13:00:00 NaN NaN NaN NaN
2015-07-09 14:00:00   2   2   2   2
2015-07-09 15:00:00   2   2   2   2
2015-07-09 16:00:00   2   2   2   2
2015-07-09 17:00:00   2   2   2   2

combiner = lambda x, y: np.where(x.isnull(), y, x)

# use df1 to update result
result.combine(df1, combiner)

Out[249]: 
                     A  B  C   D
2015-07-09 12:00:00  1  1  1 NaN
2015-07-09 13:00:00  1  1  1 NaN
2015-07-09 14:00:00  2  2  2   2
2015-07-09 15:00:00  2  2  2   2
2015-07-09 16:00:00  2  2  2   2
2015-07-09 17:00:00  2  2  2   2

# maybe fillna(method='ffill') if you like

answered Nov 15 '22 20:11

Jianxun Li

In addition to previous answer, after reindexing you can use

result.fillna(df1, inplace=True)

so based on Jianxun Li's code (extended with one more column) you can try this

# your data
# ===========================================================
df1 = pd.DataFrame(np.ones(12).reshape(4,3), columns='A B C'.split(), index=pd.date_range('2015-07-09 12:00:00', periods=4, freq='H'))
df2 = pd.DataFrame(np.ones(20).reshape(4,5)*2, columns='A B C D E'.split(), index=pd.date_range('2015-07-09 14:00:00', periods=4, freq='H'))

# processing
# =====================================================
# reindex to populate NaN
result = df2.reindex(np.union1d(df1.index, df2.index))
# fill NaN from df1
result.fillna(df1, inplace=True)

Out[3]:             
                     A  B  C   D   E
2015-07-09 12:00:00  1  1  1 NaN NaN
2015-07-09 13:00:00  1  1  1 NaN NaN
2015-07-09 14:00:00  2  2  2   2   2
2015-07-09 15:00:00  2  2  2   2   2
2015-07-09 16:00:00  2  2  2   2   2
2015-07-09 17:00:00  2  2  2   2   2

answered Nov 15 '22 19:11

herbico

Related questions
                            
                                How to redirect the raw_input to stderr and not stdout?
                            
                                Get (column, row) index from NumPy array that meets a boolean condition
                            
                                How to access the next element in for loop in current index/iteration?
                            
                                Django login form
                            
                                Can't run MySql Utilities
                            
                                Forming numpy array from array buffer from shared memory (multiprocessing) fails
                            
                                Breaking ties in Python sort
                            
                                Python-peewee get last saved row
                            
                                Python Convert time to UTC format
                            
                                Python: How do I format numbers for a fixed width?
                            
                                Convert a list of bytes to a byte string
                            
                                Pydub concatenate mp3 in a directory
                            
                                Two's complement of Hex number in Python
                            
                                pip install error: cannot import name 'unpack_url'
                            
                                Counting differences between two strings
                            
                                Python | Why is accessing instance attribute slower than local?
                            
                                python-docx style_id error while creating a word document
                            
                                Reduce by key in python
                            
                                How to shift a string to right in python?
                            
                                How can I format a float to variable precision?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With