I want to loop trough column names of two data frames, find the columns with identical column name, and combine them to create a new data frame. I tried to write an if-else condition in a for loop but it doesn't work. To be specific, I have two data frame like this: <pre class="prettyprint"><code>df1 = pd.DataFrame({'A': {'2019Q1': 1, '2019Q2': 2, '2019Q3': 3}, 'B': {'2019Q1': 1, '2019Q2': 3, '2019Q3': 5}, 'C': {'2019Q1': 2, '2019Q2': 4, '2019Q3': 6}}) df2 = pd.DataFrame({'A': {'2019Q1': 4, '2019Q2': 5, '2019Q3': 6}, 'B': {'2019Q1': 1.5, '2019Q2': 3.3, '2019Q3': 5.6}, 'C': {'2019Q1': 2.3, '2019Q2': 4.8, '2019Q3': 6.7}}) </code></pre> I want outputs like below- for A, output= <pre class="prettyprint"><code>pd.DataFrame({'df1': {'2019Q1': 1, '2019Q2': 2, '2019Q3': 3}, 'df2': {'2019Q1': 4, '2019Q2': 5, '2019Q3': 6}) </code></pre> for B, output= <pre class="prettyprint"><code>pd.DataFrame({'df1': {'2019Q1': 1, '2019Q2': 3, '2019Q3': 5}, 'df2': {'2019Q1': 1.5, '2019Q2': 3.3, '2019Q3': 5.6}) </code></pre> for C,output= <pre class="prettyprint"><code>pd.DataFrame({'df1': {'2019Q1': 2, '2019Q2': 4, '2019Q3': 6}, 'df2': {'2019Q1': 2.3, '2019Q2': 4.8, '2019Q3': 6.7}) </code></pre> Thank you very much for your help!

Here is one way similar to @ALollz but save the subdf in multiple index dataframe <pre class="prettyprint"><code>s = pd.concat([df1, df2], keys=['df1', 'df2']).unstack(0) s.loc[:,'A'] Out[390]: df1 df2 2019Q1 1 4 2019Q2 2 5 2019Q3 3 6 </code></pre>

<code>concat</code> with keys + <code>groupby</code>. Store the results in a dict, with the columns as keys. <pre class="prettyprint"><code>d = {idx: gp.droplevel(1, axis=1) for idx, gp in pd.concat([df1, df2], keys=['df1', 'df2'], axis=1).groupby(level=1, axis=1)} d['A'] # df1 df2 #2019Q1 1 4 #2019Q2 2 5 #2019Q3 3 6 d['B'] # df1 df2 #2019Q1 1 1.5 #2019Q2 3 3.3 #2019Q3 5 5.6 </code></pre> <hr> The above will create Frames for all columns regardless if they're found in both. If that's not useful you can change the concat to: <pre class="prettyprint"><code>cols = df1.columns.union(df2.columns) pd.concat([df1[cols], df2[cols]], axis=1, keys=['df1', 'df2']) </code></pre>

How to look up identical column names in two dataframes and combine the matched columns

Tags:

python

loops

pandas

I want to loop trough column names of two data frames, find the columns with identical column name, and combine them to create a new data frame.

I tried to write an if-else condition in a for loop but it doesn't work.

To be specific, I have two data frame like this:

df1 = pd.DataFrame({'A': {'2019Q1': 1, '2019Q2': 2, '2019Q3': 3},
                'B': {'2019Q1': 1, '2019Q2': 3, '2019Q3': 5},
                'C': {'2019Q1': 2, '2019Q2': 4, '2019Q3': 6}})

df2 = pd.DataFrame({'A': {'2019Q1': 4, '2019Q2': 5, '2019Q3': 6},
                'B': {'2019Q1': 1.5, '2019Q2': 3.3, '2019Q3': 5.6},
                'C': {'2019Q1': 2.3, '2019Q2': 4.8, '2019Q3': 6.7}})

I want outputs like below-

for A, output=

pd.DataFrame({'df1': {'2019Q1': 1, '2019Q2': 2, '2019Q3': 3},
              'df2': {'2019Q1': 4, '2019Q2': 5, '2019Q3': 6})

for B, output=

pd.DataFrame({'df1': {'2019Q1': 1, '2019Q2': 3, '2019Q3': 5},
              'df2': {'2019Q1': 1.5, '2019Q2': 3.3, '2019Q3': 5.6})

for C,output=

pd.DataFrame({'df1': {'2019Q1': 2, '2019Q2': 4, '2019Q3': 6},
              'df2': {'2019Q1': 2.3, '2019Q2': 4.8, '2019Q3': 6.7})

Thank you very much for your help!

398

asked Aug 20 '19 21:08

Kaaaaaake

Video Answer

2 Answers

Here is one way similar to @ALollz but save the subdf in multiple index dataframe

s = pd.concat([df1, df2], keys=['df1', 'df2']).unstack(0)
s.loc[:,'A']
Out[390]: 
        df1  df2
2019Q1    1    4
2019Q2    2    5
2019Q3    3    6

174

answered Sep 16 '22 14:09

BENY

concat with keys + groupby. Store the results in a dict, with the columns as keys.

d = {idx: gp.droplevel(1, axis=1) for idx, gp in
     pd.concat([df1, df2], keys=['df1', 'df2'], axis=1).groupby(level=1, axis=1)}

d['A']
#        df1  df2
#2019Q1    1    4
#2019Q2    2    5
#2019Q3    3    6

d['B']
#        df1  df2
#2019Q1    1  1.5
#2019Q2    3  3.3
#2019Q3    5  5.6

The above will create Frames for all columns regardless if they're found in both. If that's not useful you can change the concat to:

cols = df1.columns.union(df2.columns)
pd.concat([df1[cols], df2[cols]], axis=1, keys=['df1', 'df2'])

answered Sep 19 '22 14:09

ALollz

Related questions
                            
                                Comparison of a `float` to `np.nan` in Spark Dataframe
                            
                                Matplotlib sharex on data with different x values?
                            
                                Filter Dataset to get just images from specific class
                            
                                python 3.6+ logger to log pandas dataframe - how to indent the entire dataframe?
                            
                                Using ROC AUC score with Logistic Regression and Iris Dataset
                            
                                Python library functions taking no keyword arguments
                            
                                Simple Python TCP forking server using asyncio
                            
                                Comments in Python MANIFEST.in
                            
                                error: bad escape (end of pattern) at position 0 while trying to replace to backslah
                            
                                How to emulate multiprocessing.Pool.map() in AWS Lambda?
                            
                                Why ColumnTransformer does not call fit on its transformers?
                            
                                Compare the previous N rows to the current row in a pandas column
                            
                                Get the file size of the uploaded file in Django app
                            
                                zip list elements in different dataframe columns
                            
                                PytestDeprecationWarning at test setup: the funcargnames attribute was an alias for fixturenames
                            
                                Performance issue while reading data from hive using python
                            
                                Jupyter "500: Internal Server Error"; "ImportError: cannot import name ConverterMapping"
                            
                                Default Adam optimizer doesn't work in tf.keras but string `adam` does
                            
                                How to check if celery task is already running before running it again with beat?
                            
                                Dividing a list of numbers in two groups such that numbers in one group don't have any factor common with the numbers in the other group

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With