I want to loop trough column names of two data frames, find the columns with identical column name, and combine them to create a new data frame.
I tried to write an if-else condition in a for loop but it doesn't work.
To be specific, I have two data frame like this:
df1 = pd.DataFrame({'A': {'2019Q1': 1, '2019Q2': 2, '2019Q3': 3},
'B': {'2019Q1': 1, '2019Q2': 3, '2019Q3': 5},
'C': {'2019Q1': 2, '2019Q2': 4, '2019Q3': 6}})
df2 = pd.DataFrame({'A': {'2019Q1': 4, '2019Q2': 5, '2019Q3': 6},
'B': {'2019Q1': 1.5, '2019Q2': 3.3, '2019Q3': 5.6},
'C': {'2019Q1': 2.3, '2019Q2': 4.8, '2019Q3': 6.7}})
I want outputs like below-
for A, output=
pd.DataFrame({'df1': {'2019Q1': 1, '2019Q2': 2, '2019Q3': 3},
'df2': {'2019Q1': 4, '2019Q2': 5, '2019Q3': 6})
for B, output=
pd.DataFrame({'df1': {'2019Q1': 1, '2019Q2': 3, '2019Q3': 5},
'df2': {'2019Q1': 1.5, '2019Q2': 3.3, '2019Q3': 5.6})
for C,output=
pd.DataFrame({'df1': {'2019Q1': 2, '2019Q2': 4, '2019Q3': 6},
'df2': {'2019Q1': 2.3, '2019Q2': 4.8, '2019Q3': 6.7})
Thank you very much for your help!
In this article, we will discuss how to find columns that are common between two Data Frames. Below is the different approach that can be used to find common columns. In this example, we will create Pandas Dataframe from the list, and then we will use Numpy’s intersect1d () method which will return columns that are common between two Dataframes.
In order to merge two data frames with the same column names, we are going to use the pandas.concat (). This function does all the heavy lifting of performing concatenation operations along with an axis of Pandas objects while performing optional set logic (union or intersection) of the indexes (if any) on the other axes.
If DataFrames have exactly the same index then they can be compared by using np.where. This will check whether values from a column from the first DataFrame match exactly value in the column of the second: Similar behavior can be applied for numeric columns.
Two-column Lookup 1 To join strings, use the & operator. 2 The MATCH function returns the position of a value in a given range. Insert the MATCH function shown below. 3 Finish by pressing CTRL + SHIFT + ENTER. ... 4 Use this result and the INDEX function to return the 3rd value in the range C2:C8.
Here is one way similar to @ALollz but save the subdf in multiple index dataframe
s = pd.concat([df1, df2], keys=['df1', 'df2']).unstack(0)
s.loc[:,'A']
Out[390]:
df1 df2
2019Q1 1 4
2019Q2 2 5
2019Q3 3 6
concat
with keys + groupby
. Store the results in a dict, with the columns as keys.
d = {idx: gp.droplevel(1, axis=1) for idx, gp in
pd.concat([df1, df2], keys=['df1', 'df2'], axis=1).groupby(level=1, axis=1)}
d['A']
# df1 df2
#2019Q1 1 4
#2019Q2 2 5
#2019Q3 3 6
d['B']
# df1 df2
#2019Q1 1 1.5
#2019Q2 3 3.3
#2019Q3 5 5.6
The above will create Frames for all columns regardless if they're found in both. If that's not useful you can change the concat to:
cols = df1.columns.union(df2.columns)
pd.concat([df1[cols], df2[cols]], axis=1, keys=['df1', 'df2'])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With