Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas DataFrame concat vs append

Tags:

python

pandas

I have a list of 4 pandas dataframes containing a day of tick data that I want to merge into a single data frame. I cannot understand the behavior of concat on my timestamps. See details below:

data  [<class 'pandas.core.frame.DataFrame'> DatetimeIndex: 35228 entries, 2013-03-28 00:00:07.089000+02:00 to 2013-03-28 18:59:20.357000+02:00 Data columns: Price       4040  non-null values Volume      4040  non-null values BidQty      35228  non-null values BidPrice    35228  non-null values AskPrice    35228  non-null values AskQty      35228  non-null values dtypes: float64(6), <class 'pandas.core.frame.DataFrame'>  DatetimeIndex: 33088 entries, 2013-04-01 00:03:17.047000+02:00 to 2013-04-01 18:59:58.175000+02:00 Data columns: Price       3969  non-null values Volume      3969  non-null values BidQty      33088  non-null values BidPrice    33088  non-null values AskPrice    33088  non-null values AskQty      33088  non-null values dtypes: float64(6), <class 'pandas.core.frame.DataFrame'>  DatetimeIndex: 50740 entries, 2013-04-02 00:03:27.470000+02:00 to 2013-04-02 18:59:58.172000+02:00 Data columns: Price       7326  non-null values Volume      7326  non-null values BidQty      50740  non-null values BidPrice    50740  non-null values AskPrice    50740  non-null values AskQty      50740  non-null values dtypes: float64(6), <class 'pandas.core.frame.DataFrame'>  DatetimeIndex: 60799 entries, 2013-04-03 00:03:06.994000+02:00 to 2013-04-03 18:59:58.180000+02:00 Data columns: Price       8258  non-null values Volume      8258  non-null values BidQty      60799  non-null values BidPrice    60799  non-null values AskPrice    60799  non-null values AskQty      60799  non-null values dtypes: float64(6)] 

Using append I get:

pd.DataFrame().append(data)  <class 'pandas.core.frame.DataFrame'> DatetimeIndex: 179855 entries, 2013-03-28 00:00:07.089000+02:00 to 2013-04-03 18:59:58.180000+02:00 Data columns: AskPrice    179855  non-null values AskQty      179855  non-null values BidPrice    179855  non-null values BidQty      179855  non-null values Price       23593  non-null values Volume      23593  non-null values dtypes: float64(6) 

Using concat I get:

pd.concat(data)  <class 'pandas.core.frame.DataFrame'> DatetimeIndex: 179855 entries, 2013-03-27 22:00:07.089000+02:00 to 2013-04-03 16:59:58.180000+02:00 Data columns: Price       23593  non-null values Volume      23593  non-null values BidQty      179855  non-null values BidPrice    179855  non-null values AskPrice    179855  non-null values AskQty      179855  non-null values dtypes: float64(6) 

Notice how the index changes when using concat. Why is that happening and how would I go about using concat to reproduce the results obtained using append? (Since concat seems so much faster; 24.6 ms per loop vs 3.02 s per loop)

like image 504
JPBelanger Avatar asked Apr 04 '13 18:04

JPBelanger


People also ask

Which is faster pandas concat or append?

In this benchmark, concatenating multiple dataframes by using the Pandas. concat function is 50 times faster than using the DataFrame. append version. With multiple append , a new DataFrame is created at each iteration, and the underlying data is copied each time.

Is pandas concat slow?

Pandas DataFrames are fantastic. However, concatenating them using standard approaches, such as pandas. concat() , can be very slow with large dataframes. This is a work around for that problem.

What is difference between pandas concat and merge?

Concat function concatenates dataframes along rows or columns. We can think of it as stacking up multiple dataframes. Merge combines dataframes based on values in shared columns. Merge function offers more flexibility compared to concat function because it allows combinations based on a condition.


1 Answers

Pandas concat vs append vs join vs merge

  • Concat gives the flexibility to join based on the axis( all rows or all columns)

  • Append is the specific case(axis=0, join='outer') of concat

  • Join is based on the indexes (set by set_index) on how variable =['left','right','inner','couter']

  • Merge is based on any particular column each of the two dataframes, this columns are variables on like 'left_on', 'right_on', 'on'

like image 92
Mohsin Mahmood Avatar answered Sep 21 '22 23:09

Mohsin Mahmood