Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas join/merge/concat two dataframes

Tags:

python

pandas

I am having issues with joins in pandas and I am trying to figure out what is wrong. Say I have a dataframe x:

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1941 entries, 2004-10-19 00:00:00 to 2012-07-23 00:00:00
Data columns:
close    1941  non-null values
high     1941  non-null values
low      1941  non-null values
open     1941  non-null values
dtypes: float64(4)

should I be able to join it with y on index with a simple join command where y = x except colnames have +2.

 <class 'pandas.core.frame.DataFrame'>
 DatetimeIndex: 1941 entries, 2004-10-19 00:00:00 to 2012-07-23 00:00:00
 Data columns:
 close2    1941  non-null values
 high2     1941  non-null values
 low2      1941  non-null values
 open2     1941  non-null values
 dtypes: float64(4)

 y.join(x) or pandas.DataFrame.join(y,x):
 <class 'pandas.core.frame.DataFrame'>
 DatetimeIndex: 34879 entries, 2004-12-16 00:00:00 to 2012-07-12 00:00:00
 Data columns:
 close2    34879  non-null values
 high2     34879  non-null values
 low2      34879  non-null values
 open2     34879  non-null values
 close     34879  non-null values
 high      34879  non-null values
 low       34879  non-null values
 open      34879  non-null values
 dtypes: float64(8)

I expect the final to have 1941 non-values for both. I tried merge as well but I have the same issue.

I had thought the right answer was pandas.concat([x,y]), but this does not do what I intend either.

In [83]: pandas.concat([x,y]) 
Out[83]: <class 'pandas.core.frame.DataFrame'> 
DatetimeIndex: 3882 entries, 2004-10-19 00:00:00 to 2012-07-23 00:00:00 
Data columns: 
close2 3882 non-null values 
high2 3882 non-null values 
low2 3882 non-null values 
open2 3882 non-null values 
dtypes: float64(4) 

edit: If you are having issues with join, read Wes's answer below. I had one time stamp that was duplicated.

like image 745
Michael WS Avatar asked Jul 24 '12 18:07

Michael WS


People also ask

What is the difference between concat merge and join in pandas?

merge() for combining data on common columns or indices. . join() for combining data on a key column or an index. concat() for combining DataFrames across rows or columns.

How do I merge two Dataframes in pandas based on common column?

To merge two Pandas DataFrame with common column, use the merge() function and set the ON parameter as the column name.

Is join or merge faster pandas?

The Fastest Ways As it turns out, join always tends to perform well, and merge will perform almost exactly the same given the syntax is optimal.


2 Answers

Does your index have duplicates x.index.is_unique? If so would explain the behavior you're seeing:

In [16]: left
Out[16]: 
            a
2000-01-01  1
2000-01-01  1
2000-01-01  1
2000-01-02  2
2000-01-02  2
2000-01-02  2

In [17]: right
Out[17]: 
            b
2000-01-01  3
2000-01-01  3
2000-01-01  3
2000-01-02  4
2000-01-02  4
2000-01-02  4

In [18]: left.join(right)
Out[18]: 
            a  b
2000-01-01  1  3
2000-01-01  1  3
2000-01-01  1  3
2000-01-01  1  3
2000-01-01  1  3
2000-01-01  1  3
2000-01-01  1  3
2000-01-01  1  3
2000-01-01  1  3
2000-01-02  2  4
2000-01-02  2  4
2000-01-02  2  4
2000-01-02  2  4
2000-01-02  2  4
2000-01-02  2  4
2000-01-02  2  4
2000-01-02  2  4
2000-01-02  2  4
like image 196
Wes McKinney Avatar answered Sep 23 '22 19:09

Wes McKinney


It sounds like maybe you want pandas.concat? merge and join do, well, joins, which means they will give you something based around the Cartesian product of the two inputs, but it sounds like you just want to paste them together into one big table.

Edit: did you try concat with axis=1? It seems to do what you're asking for:

>>> print x
          A         B         C
0  0.155614 -0.252148  0.861163
1  0.973517  1.156465 -0.458846
2  2.504356 -0.356371 -0.737842
3  0.012994  1.785123  0.161667
4  0.574578  0.123689  0.017598
>>> print y
         A2        B2        C2
0 -0.280993  1.278750 -0.704449
1  0.140282  1.955322 -0.953826
2  0.581997 -0.239829  2.227069
3 -0.876146 -1.955199 -0.155030
4 -0.518593 -2.630978  0.333264
>>> print pandas.concat([x, y], axis=1)
          A         B         C        A2        B2        C2
0  0.155614 -0.252148  0.861163 -0.280993  1.278750 -0.704449
1  0.973517  1.156465 -0.458846  0.140282  1.955322 -0.953826
2  2.504356 -0.356371 -0.737842  0.581997 -0.239829  2.227069
3  0.012994  1.785123  0.161667 -0.876146 -1.955199 -0.155030
4  0.574578  0.123689  0.017598 -0.518593 -2.630978  0.333264
like image 41
BrenBarn Avatar answered Sep 22 '22 19:09

BrenBarn