I want to be able to append df1 df2, df3 into one df_All , but since each of the dataframe has different column. How could I do this in for loop ( I have others stuff that i have to do in the for loop ) ?
import pandas as pd
import numpy as np
df1 = pd.DataFrame.from_items([('A', [1, 2, 3]), ('B', [4, 5, 6])])
df2 = pd.DataFrame.from_items([('B', [5, 6, 7]), ('A', [8, 9, 10])])
df3 = pd.DataFrame.from_items([('C', [5, 6, 7]), ('D', [8, 9, 10]), ('A',[1,2,3]), ('B',[4,5,7])])
list = ['df1','df2','df3']
df_All = pd.DataFrame()
for i in list:
# doing something else as well ---
df_All = df_All.append(i)
I want my df_All to only have ( A & B ) only, is there a way to this in loop above ? something like append only this two columns ?
Concatenating DataFrames. We can use the concat function in pandas to append either columns or rows from one DataFrame to another.
It is possible to join the different columns is using concat() method. DataFrame: It is dataframe name. axis: 0 refers to the row axis and1 refers the column axis. join: Type of join.
Add multiple columns to a data frame using Dataframe. assign() method. Using DataFrame. assign() method, we can set column names as parameters and pass values as list to replace/create the columns.
Here are two commands which can be used: Use Dataframe join command to append the columns. Use Pandas concat command to append the columns. Both methods can be used to join multiple columns from different data frames and create one data frame.
If I understand what you want then you need to select just columns 'A' and 'B' from df3
and then use pd.concat
:
In [35]:
df1 = pd.DataFrame.from_items([('A', [1, 2, 3]), ('B', [4, 5, 6])])
df2 = pd.DataFrame.from_items([('B', [5, 6, 7]), ('A', [8, 9, 10])])
df3 = pd.DataFrame.from_items([('C', [5, 6, 7]), ('D', [8, 9, 10]), ('A',[1,2,3]), ('B',[4,5,7])])
df_list = [df1,df2,df3[['A','B']]]
pd.concat(df_list, ignore_index=True)
Out[35]:
A B
0 1 4
1 2 5
2 3 6
3 8 5
4 9 6
5 10 7
6 1 4
7 2 5
8 3 7
Note that in your original code this is poor practice:
list = ['df1','df2','df3']
This shadows the built in type list
plus even if it was actually a valid var name like df_list
you've created a list of strings and not a list of dfs.
If you want to determine the common columns then you can determine this using the np.intersection
method on the columns:
In [39]:
common_cols = df1.columns.intersection(df2.columns).intersection(df3.columns)
common_cols
Out[39]:
Index(['A', 'B'], dtype='object')
You can also use set comprehension to join all common columns from an arbitrary list of DataFrames:
df_list = [df1, df2, df3]
common_cols = list(set.intersection(*(set(c) for c in df_list)))
df_new = pd.concat([df[common_cols] for df in df_list], ignore_index=True)
>>> df_new
A B
0 1 4
1 2 5
2 3 6
3 8 5
4 9 6
5 10 7
6 1 4
7 2 5
8 3 7
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With