I have a table of sites with a land cover class and a state. I have another table with values linked to class and state. In the second table, however, some of the rows are linked only to class:
sites = pd.DataFrame({'id': ['a', 'b', 'c'],
'class': [1, 2, 23],
'state': ['al', 'ar', 'wy']})
values = pd.DataFrame({'class': [1, 1, 2, 2, 23],
'state': ['al', 'ar', 'al', 'ar', None],
'val': [10, 11, 12, 13, 16]})
I'd like to link the tables by class and state, except for those rows in the value table for which state is None, in which case they would be linked only by class.
A merge has the following result:
combined = sites.merge(values, how='left', on=['class', 'state'])
id class state val
0 a 1 al 10.0
1 b 2 ar 13.0
2 c 23 wy NaN
But I'd like val in the last row to be 16. Is there an inexpensive way to do this short of breaking up both tables, performing separate merges, and then concatenating the result?
By use + operator simply you can combine/merge two or multiple text/string columns in pandas DataFrame. Note that when you apply + operator on numeric columns it actually does addition instead of concatenation.
It is possible to join the different columns is using concat() method. DataFrame: It is dataframe name. axis: 0 refers to the row axis and1 refers the column axis. join: Type of join.
Dataframes in Pandas can be merged using pandas. merge() method. Returns : A DataFrame of the two merged objects. While working on datasets there may be a need to merge two data frames with some complex conditions, below are some examples of merging two data frames with some complex conditions.
Merging Dataframes by index of both the dataframes As both the dataframe contains similar IDs on the index. So, to merge the dataframe on indices pass the left_index & right_index arguments as True i.e. Both the dataframes are merged on index using default Inner Join.
Merge two Pandas DataFrames on certain columns. We can merge two Pandas DataFrames on certain columns using the merge function by simply specifying the certain columns for merge. Syntax: DataFrame.merge (right, how=’inner’, on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, copy=True, indicator=False, ...
You have now learned the three most important techniques for combining data in Pandas: merge () for combining data on common columns or indices. .join () for combining data on a key column or an index. concat () for combining DataFrames across rows or columns.
If you want to join on columns like you would with merge(), then you’ll need to set the columns as indices. Like merge(), .join() has a few parameters that give you more flexibility in your joins. However, with .join(), the list of parameters is relatively short: other: This is the only required parameter. It defines the other DataFrame to join.
If True, adds a column to the output DataFrame called “_merge” with information on the source of each row. The column can be given a different name by providing a string argument.
How about merge them separately:
pd.concat([sites.merge(values, on=['class','state']),
sites.merge(values[values['state'].isna()].drop('state',axis=1),
on=['class'])
])
Output:
id class state val
0 a 1 al 10
1 b 2 ar 13
0 c 23 wy 16
We can use combine_first
here:
(sites.set_index(['class','state'])
.combine_first(values.set_index(['class','state']))
.dropna().reset_index())
class state id val
0 1 al a 10.0
1 2 ar b 13.0
2 23 wy c 16.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With