I have a list of 4 pandas dataframes containing a day of tick data that I want to merge into a single data frame. I cannot understand the behavior of concat on my timestamps. See details below:
data [<class 'pandas.core.frame.DataFrame'> DatetimeIndex: 35228 entries, 2013-03-28 00:00:07.089000+02:00 to 2013-03-28 18:59:20.357000+02:00 Data columns: Price 4040 non-null values Volume 4040 non-null values BidQty 35228 non-null values BidPrice 35228 non-null values AskPrice 35228 non-null values AskQty 35228 non-null values dtypes: float64(6), <class 'pandas.core.frame.DataFrame'> DatetimeIndex: 33088 entries, 2013-04-01 00:03:17.047000+02:00 to 2013-04-01 18:59:58.175000+02:00 Data columns: Price 3969 non-null values Volume 3969 non-null values BidQty 33088 non-null values BidPrice 33088 non-null values AskPrice 33088 non-null values AskQty 33088 non-null values dtypes: float64(6), <class 'pandas.core.frame.DataFrame'> DatetimeIndex: 50740 entries, 2013-04-02 00:03:27.470000+02:00 to 2013-04-02 18:59:58.172000+02:00 Data columns: Price 7326 non-null values Volume 7326 non-null values BidQty 50740 non-null values BidPrice 50740 non-null values AskPrice 50740 non-null values AskQty 50740 non-null values dtypes: float64(6), <class 'pandas.core.frame.DataFrame'> DatetimeIndex: 60799 entries, 2013-04-03 00:03:06.994000+02:00 to 2013-04-03 18:59:58.180000+02:00 Data columns: Price 8258 non-null values Volume 8258 non-null values BidQty 60799 non-null values BidPrice 60799 non-null values AskPrice 60799 non-null values AskQty 60799 non-null values dtypes: float64(6)]
Using append
I get:
pd.DataFrame().append(data) <class 'pandas.core.frame.DataFrame'> DatetimeIndex: 179855 entries, 2013-03-28 00:00:07.089000+02:00 to 2013-04-03 18:59:58.180000+02:00 Data columns: AskPrice 179855 non-null values AskQty 179855 non-null values BidPrice 179855 non-null values BidQty 179855 non-null values Price 23593 non-null values Volume 23593 non-null values dtypes: float64(6)
Using concat
I get:
pd.concat(data) <class 'pandas.core.frame.DataFrame'> DatetimeIndex: 179855 entries, 2013-03-27 22:00:07.089000+02:00 to 2013-04-03 16:59:58.180000+02:00 Data columns: Price 23593 non-null values Volume 23593 non-null values BidQty 179855 non-null values BidPrice 179855 non-null values AskPrice 179855 non-null values AskQty 179855 non-null values dtypes: float64(6)
Notice how the index changes when using concat
. Why is that happening and how would I go about using concat
to reproduce the results obtained using append
? (Since concat
seems so much faster; 24.6 ms per loop vs 3.02 s per loop)
In this benchmark, concatenating multiple dataframes by using the Pandas. concat function is 50 times faster than using the DataFrame. append version. With multiple append , a new DataFrame is created at each iteration, and the underlying data is copied each time.
Pandas DataFrames are fantastic. However, concatenating them using standard approaches, such as pandas. concat() , can be very slow with large dataframes. This is a work around for that problem.
Concat function concatenates dataframes along rows or columns. We can think of it as stacking up multiple dataframes. Merge combines dataframes based on values in shared columns. Merge function offers more flexibility compared to concat function because it allows combinations based on a condition.
Concat gives the flexibility to join based on the axis( all rows or all columns)
Append is the specific case(axis=0, join='outer') of concat
Join is based on the indexes (set by set_index) on how variable =['left','right','inner','couter']
Merge is based on any particular column each of the two dataframes, this columns are variables on like 'left_on', 'right_on', 'on'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With