Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I concatenate Pandas DataFrames by column and index?

Tags:

python

pandas

I've got four Pandas DataFrames with numerical columns and indices:

A = pd.DataFrame(data={"435000": [9.792, 9.795], "435002": [9.825, 9.812]}, index=[119000, 119002])
B = pd.DataFrame(data={"435004": [9.805, 9.783], "435006": [9.785, 9.78]}, index=[119000, 119002])
C = pd.DataFrame(data={"435000": [9.778, 9.743], "435002": [9.75, 9.743]}, index=[119004, 119006])
D = pd.DataFrame(data={"435004": [9.743, 9.743], "435006": [9.762, 9.738]}, index=[119004, 119006])

enter image description here

I want to concatenate them into one DataFrame like this, matching on both column names and indices:

enter image description here

If I try to pd.concat the four dfs, they are stacked (either above and below, or to the side, depending on axis) and I end up with NaN values in the df:

result = pd.concat([A, B, C, D], axis=0)

enter image description here

How can I use pd.concat (or merge, join etc.) to get the right result?

like image 897
user2950747 Avatar asked May 19 '17 07:05

user2950747


People also ask

Does PD concat join on index?

pd. concat joins on the index and can join two or more DataFrames at once. It does a full outer join by default.

How do you concatenate two DataFrames with different columns in pandas?

It is possible to join the different columns is using concat() method. DataFrame: It is dataframe name. axis: 0 refers to the row axis and1 refers the column axis. join: Type of join.


2 Answers

You need concat in pairs:

result = pd.concat([pd.concat([A, C], axis=0), pd.concat([B, D], axis=0)], axis=1)
print (result)
        435000  435002  435004  435006
119000   9.792   9.825   9.805   9.785
119002   9.795   9.812   9.783   9.780
119004   9.778   9.750   9.743   9.762
119006   9.743   9.743   9.743   9.738

Better is stack + concat + unstack:

result = pd.concat([A.stack(), B.stack(), C.stack(), D.stack()], axis=0).unstack()
print (result)
        435000  435002  435004  435006
119000   9.792   9.825   9.805   9.785
119002   9.795   9.812   9.783   9.780
119004   9.778   9.750   9.743   9.762
119006   9.743   9.743   9.743   9.738

More dynamic:

dfs = [A,B,C,D]
result = pd.concat([df.stack() for df in dfs], axis=0).unstack()
print (result)
        435000  435002  435004  435006
119000   9.792   9.825   9.805   9.785
119002   9.795   9.812   9.783   9.780
119004   9.778   9.750   9.743   9.762
119006   9.743   9.743   9.743   9.738
like image 98
jezrael Avatar answered Oct 21 '22 01:10

jezrael


You can use join too:

pd.concat((A.join(B), C.join(D)))
Out: 
        435000  435002  435004  435006
119000   9.792   9.825   9.805   9.785
119002   9.795   9.812   9.783   9.780
119004   9.778   9.750   9.743   9.762
119006   9.743   9.743   9.743   9.738
like image 44
ayhan Avatar answered Oct 21 '22 02:10

ayhan