Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas merge on variable columns

Tags:

python

pandas

I have a table of sites with a land cover class and a state. I have another table with values linked to class and state. In the second table, however, some of the rows are linked only to class:

sites = pd.DataFrame({'id': ['a', 'b', 'c'],
                      'class': [1, 2, 23],
                      'state': ['al', 'ar', 'wy']})

values = pd.DataFrame({'class': [1, 1, 2, 2, 23],
                       'state': ['al', 'ar', 'al', 'ar', None],
                       'val': [10, 11, 12, 13, 16]})

I'd like to link the tables by class and state, except for those rows in the value table for which state is None, in which case they would be linked only by class.

A merge has the following result:

combined = sites.merge(values, how='left', on=['class', 'state'])

  id  class state   val
0  a      1    al  10.0
1  b      2    ar  13.0
2  c     23    wy   NaN

But I'd like val in the last row to be 16. Is there an inexpensive way to do this short of breaking up both tables, performing separate merges, and then concatenating the result?

like image 669
triphook Avatar asked Dec 18 '19 15:12

triphook


People also ask

How do I merge columns in pandas?

By use + operator simply you can combine/merge two or multiple text/string columns in pandas DataFrame. Note that when you apply + operator on numeric columns it actually does addition instead of concatenation.

How can I join two Dataframes in pandas with different column names?

It is possible to join the different columns is using concat() method. DataFrame: It is dataframe name. axis: 0 refers to the row axis and1 refers the column axis. join: Type of join.

How do I merge two Dataframes in pandas based on condition?

Dataframes in Pandas can be merged using pandas. merge() method. Returns : A DataFrame of the two merged objects. While working on datasets there may be a need to merge two data frames with some complex conditions, below are some examples of merging two data frames with some complex conditions.

Can you merge DataFrame on index?

Merging Dataframes by index of both the dataframes As both the dataframe contains similar IDs on the index. So, to merge the dataframe on indices pass the left_index & right_index arguments as True i.e. Both the dataframes are merged on index using default Inner Join.

How to merge two pandas DataFrames on certain columns?

Merge two Pandas DataFrames on certain columns. We can merge two Pandas DataFrames on certain columns using the merge function by simply specifying the certain columns for merge. Syntax: DataFrame.merge (right, how=’inner’, on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, copy=True, indicator=False, ...

How do you combine data in a Dataframe?

You have now learned the three most important techniques for combining data in Pandas: merge () for combining data on common columns or indices. .join () for combining data on a key column or an index. concat () for combining DataFrames across rows or columns.

How do I join two Dataframe columns in Python?

If you want to join on columns like you would with merge(), then you’ll need to set the columns as indices. Like merge(), .join() has a few parameters that give you more flexibility in your joins. However, with .join(), the list of parameters is relatively short: other: This is the only required parameter. It defines the other DataFrame to join.

What does “_merge” add to the output Dataframe?

If True, adds a column to the output DataFrame called “_merge” with information on the source of each row. The column can be given a different name by providing a string argument.


2 Answers

How about merge them separately:

pd.concat([sites.merge(values, on=['class','state']),
           sites.merge(values[values['state'].isna()].drop('state',axis=1),
                       on=['class'])
          ])

Output:

  id  class state  val
0  a      1    al   10
1  b      2    ar   13
0  c     23    wy   16
like image 140
Quang Hoang Avatar answered Oct 03 '22 06:10

Quang Hoang


We can use combine_first here:

(sites.set_index(['class','state'])
  .combine_first(values.set_index(['class','state']))
  .dropna().reset_index())

   class state id   val
0      1    al  a  10.0
1      2    ar  b  13.0
2     23    wy  c  16.0
like image 38
anky Avatar answered Oct 03 '22 06:10

anky