I have multiple pandas
data frame objects cost1, cost2, cost3 ....
How can I append rows from all of these data frames into one single data frame while retaining elements from only the common column names?
As of now I have
frames=[cost1,cost2,cost3]
new_combined = pd.concat(frames, ignore_index=True)
This obviously contains columns which are not common across all data frames.
To merge two Pandas DataFrame with common column, use the merge() function and set the ON parameter as the column name.
The concat() function in pandas is used to append either columns or rows from one DataFrame to another. The concat() function does all the heavy lifting of performing concatenation operations along an axis while performing optional set logic (union or intersection) of the indexes (if any) on the other axes.
It is possible to join the different columns is using concat() method. DataFrame: It is dataframe name. axis: 0 refers to the row axis and1 refers the column axis. join: Type of join.
For future readers, Above functionality can be implemented by pandas itself. Pandas can concat dataframe while keeping common columns only, if you provide join='inner' argument in pd.concat. e.g.
pd.concat(frames,join='inner', ignore_index=True)
You can find the common columns with Python's set.intersection
:
common_cols = list(set.intersection(*(set(df.columns) for df in frames)))
To concatenate using only the common columns, you can use
pd.concat([df[common_cols] for df in frames], ignore_index=True)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With