Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to append selected columns to pandas dataframe from df with different columns

I want to be able to append df1 df2, df3 into one df_All , but since each of the dataframe has different column. How could I do this in for loop ( I have others stuff that i have to do in the for loop ) ?

import pandas as pd
import numpy as np

df1 = pd.DataFrame.from_items([('A', [1, 2, 3]), ('B', [4, 5, 6])])
df2 = pd.DataFrame.from_items([('B', [5, 6, 7]), ('A', [8, 9, 10])])
df3 = pd.DataFrame.from_items([('C', [5, 6, 7]), ('D', [8, 9, 10]), ('A',[1,2,3]), ('B',[4,5,7])])
list = ['df1','df2','df3']
df_All = pd.DataFrame()
for i in list:
   # doing something else as well --- 
    df_All = df_All.append(i)

enter image description here

I want my df_All to only have ( A & B ) only, is there a way to this in loop above ? something like append only this two columns ?

like image 212
JPC Avatar asked Mar 29 '15 22:03

JPC


People also ask

Can you append DataFrames with different columns?

Concatenating DataFrames. We can use the concat function in pandas to append either columns or rows from one DataFrame to another.

How do I append DataFrames with different columns in Python?

It is possible to join the different columns is using concat() method. DataFrame: It is dataframe name. axis: 0 refers to the row axis and1 refers the column axis. join: Type of join.

How do I add multiple columns from one DataFrame to another in pandas?

Add multiple columns to a data frame using Dataframe. assign() method. Using DataFrame. assign() method, we can set column names as parameters and pass values as list to replace/create the columns.

How do I append specific columns?

Here are two commands which can be used: Use Dataframe join command to append the columns. Use Pandas concat command to append the columns. Both methods can be used to join multiple columns from different data frames and create one data frame.


2 Answers

If I understand what you want then you need to select just columns 'A' and 'B' from df3 and then use pd.concat :

In [35]:

df1 = pd.DataFrame.from_items([('A', [1, 2, 3]), ('B', [4, 5, 6])])
df2 = pd.DataFrame.from_items([('B', [5, 6, 7]), ('A', [8, 9, 10])])
df3 = pd.DataFrame.from_items([('C', [5, 6, 7]), ('D', [8, 9, 10]), ('A',[1,2,3]), ('B',[4,5,7])])
df_list = [df1,df2,df3[['A','B']]]
pd.concat(df_list, ignore_index=True)
Out[35]:
    A  B
0   1  4
1   2  5
2   3  6
3   8  5
4   9  6
5  10  7
6   1  4
7   2  5
8   3  7

Note that in your original code this is poor practice:

list = ['df1','df2','df3']

This shadows the built in type list plus even if it was actually a valid var name like df_list you've created a list of strings and not a list of dfs.

If you want to determine the common columns then you can determine this using the np.intersection method on the columns:

In [39]:

common_cols = df1.columns.intersection(df2.columns).intersection(df3.columns)
common_cols
Out[39]:
Index(['A', 'B'], dtype='object')
like image 178
EdChum Avatar answered Sep 23 '22 05:09

EdChum


You can also use set comprehension to join all common columns from an arbitrary list of DataFrames:

df_list = [df1, df2, df3]
common_cols = list(set.intersection(*(set(c) for c in df_list)))
df_new = pd.concat([df[common_cols] for df in df_list], ignore_index=True)
>>> df_new 
    A  B
0   1  4
1   2  5
2   3  6
3   8  5
4   9  6
5  10  7
6   1  4
7   2  5
8   3  7
like image 37
Alexander Avatar answered Sep 20 '22 05:09

Alexander