Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas.concat of multiple data frames using only common columns

I have multiple pandas data frame objects cost1, cost2, cost3 ....

  1. They have different column names (and number of columns) but have some in common.
  2. Number of columns is fairly large in each data frame, hence handpicking the common columns manually will be painful.

How can I append rows from all of these data frames into one single data frame while retaining elements from only the common column names?

As of now I have

frames=[cost1,cost2,cost3]

new_combined = pd.concat(frames, ignore_index=True)

This obviously contains columns which are not common across all data frames.

like image 285
VM1 Avatar asked Oct 04 '16 22:10

VM1


People also ask

How do you merge two DataFrames based on a common column in Python?

To merge two Pandas DataFrame with common column, use the merge() function and set the ON parameter as the column name.

How do you join two pandas DataFrames using the common column of both DataFrames which function can be used?

The concat() function in pandas is used to append either columns or rows from one DataFrame to another. The concat() function does all the heavy lifting of performing concatenation operations along an axis while performing optional set logic (union or intersection) of the indexes (if any) on the other axes.

Can you concat two DataFrames with different columns?

It is possible to join the different columns is using concat() method. DataFrame: It is dataframe name. axis: 0 refers to the row axis and1 refers the column axis. join: Type of join.


2 Answers

For future readers, Above functionality can be implemented by pandas itself. Pandas can concat dataframe while keeping common columns only, if you provide join='inner' argument in pd.concat. e.g.

pd.concat(frames,join='inner', ignore_index=True)
like image 154
Alok Nayak Avatar answered Sep 22 '22 18:09

Alok Nayak


You can find the common columns with Python's set.intersection:

common_cols = list(set.intersection(*(set(df.columns) for df in frames)))

To concatenate using only the common columns, you can use

pd.concat([df[common_cols] for df in frames], ignore_index=True)
like image 25
Ami Tavory Avatar answered Sep 24 '22 18:09

Ami Tavory