I have two DataFrames
(with DatetimeIndex
) and want to update the first frame (the older one) with data from the second frame (the newer one).
The new frame may contain more recent data for rows already contained in the the old frame. In this case, data in the old frame should be overwritten with data from the new frame. Also the newer frame may have more columns / rows, than the first one. In this case the old frame should be enlarged by the data in the new frame.
Pandas docs state, that
"The .loc/.ix/[]
operations can perform enlargement when setting a non-existant key for that axis"
and
"a DataFrame can be enlarged on either axis via .loc
"
However this doesn't seem to work and throws a KeyError
. Example:
In [195]: df1
Out[195]:
A B C
2015-07-09 12:00:00 1 1 1
2015-07-09 13:00:00 1 1 1
2015-07-09 14:00:00 1 1 1
2015-07-09 15:00:00 1 1 1
In [196]: df2
Out[196]:
A B C D
2015-07-09 14:00:00 2 2 2 2
2015-07-09 15:00:00 2 2 2 2
2015-07-09 16:00:00 2 2 2 2
2015-07-09 17:00:00 2 2 2 2
In [197]: df1.loc[df2.index] = df2
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-197-74e630e87cf8> in <module>()
----> 1 df1.loc[df2.index] = df2
/.../pandas/core/indexing.pyc in __setitem__(self, key, value)
112
113 def __setitem__(self, key, value):
--> 114 indexer = self._get_setitem_indexer(key)
115 self._setitem_with_indexer(indexer, value)
116
/.../pandas/core/indexing.pyc in _get_setitem_indexer(self, key)
107
108 try:
--> 109 return self._convert_to_indexer(key, is_setter=True)
110 except TypeError:
111 raise IndexingError(key)
/.../pandas/core/indexing.pyc in _convert_to_indexer(self, obj, axis, is_setter)
1110 mask = check == -1
1111 if mask.any():
-> 1112 raise KeyError('%s not in index' % objarr[mask])
1113
1114 return _values_from_object(indexer)
KeyError: "['2015-07-09T18:00:00.000000000+0200' '2015-07-09T19:00:00.000000000+0200'] not in index"
What is the best way (with respect to performance, as my real data is much larger) two achieve the desired updated and enlarged DataFrame. This is the result I would like to see:
A B C D
2015-07-09 12:00:00 1 1 1 NaN
2015-07-09 13:00:00 1 1 1 NaN
2015-07-09 14:00:00 2 2 2 2
2015-07-09 15:00:00 2 2 2 2
2015-07-09 16:00:00 2 2 2 2
2015-07-09 17:00:00 2 2 2 2
Lets use the dataframe.set_value () function to set value of a particular index. Notice, for the non-existent row and column in the dataframe, a new row and column has been inserted. Writing code in comment? Please use ide.geeksforgeeks.org , generate link and share the link here.
Pandas dataframe.set_value () function put a single value at passed column and index. It takes the axis labels as input and a scalar value to be placed at the specified index in the dataframe. Alternative to this function is .at [] or .iat [].
If you want to modify any column’s values or even if you want to add a column with different values, then you have various methods to do so: Just add a list (Method 1) SYNTAX: dataFrameObject [column_to_be_changed] = [list_of_ columnName _to_replace_with] Using keyword at (Method 2) SYNTAX: ...
DataFrame objects have a query () method that allows selection using an expression. You can get the value of the frame where column b has values between the values of columns a and c. For example: Do the same thing but fall back on a named index if there is no column with the name a.
df2.combine_first(df1)
(documentation)
seems to serve your requirement; PFB code snippet & output
import pandas as pd
print 'pandas-version: ', pd.__version__
df1 = pd.DataFrame.from_records([('2015-07-09 12:00:00',1,1,1),
('2015-07-09 13:00:00',1,1,1),
('2015-07-09 14:00:00',1,1,1),
('2015-07-09 15:00:00',1,1,1)],
columns=['Dt', 'A', 'B', 'C']).set_index('Dt')
# print df1
df2 = pd.DataFrame.from_records([('2015-07-09 14:00:00',2,2,2,2),
('2015-07-09 15:00:00',2,2,2,2),
('2015-07-09 16:00:00',2,2,2,2),
('2015-07-09 17:00:00',2,2,2,2),],
columns=['Dt', 'A', 'B', 'C', 'D']).set_index('Dt')
res_combine1st = df2.combine_first(df1)
print res_combine1st
pandas-version: 0.15.2
A B C D
Dt
2015-07-09 12:00:00 1 1 1 NaN
2015-07-09 13:00:00 1 1 1 NaN
2015-07-09 14:00:00 2 2 2 2
2015-07-09 15:00:00 2 2 2 2
2015-07-09 16:00:00 2 2 2 2
2015-07-09 17:00:00 2 2 2 2
You can use the combine
function.
import pandas as pd
# your data
# ===========================================================
df1 = pd.DataFrame(np.ones(12).reshape(4,3), columns='A B C'.split(), index=pd.date_range('2015-07-09 12:00:00', periods=4, freq='H'))
df2 = pd.DataFrame(np.ones(16).reshape(4,4)*2, columns='A B C D'.split(), index=pd.date_range('2015-07-09 14:00:00', periods=4, freq='H'))
# processing
# =====================================================
# reindex to populate NaN
result = df2.reindex(np.union1d(df1.index, df2.index))
Out[248]:
A B C D
2015-07-09 12:00:00 NaN NaN NaN NaN
2015-07-09 13:00:00 NaN NaN NaN NaN
2015-07-09 14:00:00 2 2 2 2
2015-07-09 15:00:00 2 2 2 2
2015-07-09 16:00:00 2 2 2 2
2015-07-09 17:00:00 2 2 2 2
combiner = lambda x, y: np.where(x.isnull(), y, x)
# use df1 to update result
result.combine(df1, combiner)
Out[249]:
A B C D
2015-07-09 12:00:00 1 1 1 NaN
2015-07-09 13:00:00 1 1 1 NaN
2015-07-09 14:00:00 2 2 2 2
2015-07-09 15:00:00 2 2 2 2
2015-07-09 16:00:00 2 2 2 2
2015-07-09 17:00:00 2 2 2 2
# maybe fillna(method='ffill') if you like
In addition to previous answer, after reindexing you can use
result.fillna(df1, inplace=True)
so based on Jianxun Li's code (extended with one more column) you can try this
# your data
# ===========================================================
df1 = pd.DataFrame(np.ones(12).reshape(4,3), columns='A B C'.split(), index=pd.date_range('2015-07-09 12:00:00', periods=4, freq='H'))
df2 = pd.DataFrame(np.ones(20).reshape(4,5)*2, columns='A B C D E'.split(), index=pd.date_range('2015-07-09 14:00:00', periods=4, freq='H'))
# processing
# =====================================================
# reindex to populate NaN
result = df2.reindex(np.union1d(df1.index, df2.index))
# fill NaN from df1
result.fillna(df1, inplace=True)
Out[3]:
A B C D E
2015-07-09 12:00:00 1 1 1 NaN NaN
2015-07-09 13:00:00 1 1 1 NaN NaN
2015-07-09 14:00:00 2 2 2 2 2
2015-07-09 15:00:00 2 2 2 2 2
2015-07-09 16:00:00 2 2 2 2 2
2015-07-09 17:00:00 2 2 2 2 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With