I have two dataframes below, df_purchase(1) and df_login(2)
+--------+-----+--------+------------+--------------------+-------------+--------------------------+
| | age | gender | ttp | count | sum(amount) | region |
+--------+-----+--------+------------+--------------------+-------------+--------------------------+
| 49427 | 63 | M | 824.731412 | 2 | 25.00 | Omaha, Nebraska |
| 28433 | 49 | M | 1.166250 | 2 | 41.94 | Catasauqua, Pennsylvania |
| 4162 | 29 | M | 5.620949 | 2 | 51.78 | Eagle Center, Iowa |
| 18747 | 43 | M | 153.502072 | 2 | 23.84 | Pacific, Washington |
| 45173 | 59 | M | 0.027257 | 2 | 13.98 | De Soto, Missouri |
+--------+-----+--------+------------+--------------------+-------------+--------------------------+
+--------+-----+--------+------------+--------------------+-------------+--------------------------+
| | age | gender | count | region | | |
| 671766 | 84 | M | 13900 | New York, New York | | |
| 671166 | 84 | F | 7619 | New York, New York | | |
| 672209 | 85 | F | 6483 | New York, New York | | |
| 672671 | 85 | M | 5808 | New York, New York | | |
| 195201 | 34 | M | 3817 | New York, New York | | |
+--------+-----+--------+------------+--------------------+-------------+--------------------------+
I am trying to join df_logins to df_purchase on age, gender and region with the following pandas code:
df = pd.merge(df_purchase, df_login[['count']],
how='left', on=['age', 'gender', 'region'])
However, I keep getting this error: KeyError: 'age'
Any thoughts?
A left join, or left merge, keeps every row from the left dataframe. Result from left-join or left-merge of two dataframes in Pandas. Rows in the left dataframe that have no corresponding join value in the right dataframe are left with NaN values.
Both join and merge can be used to combines two dataframes but the join method combines two dataframes on the basis of their indexes whereas the merge method is more versatile and allows us to specify columns beside the index to join on for both dataframes.
The concat() function can be used to concatenate two Dataframes by adding the rows of one to the other. The merge() function is equivalent to the SQL JOIN clause. 'left', 'right' and 'inner' joins are all possible.
How to Fix the KeyError? We can simply fix the error by correcting the spelling of the key. If we are not sure about the spelling we can simply print the list of all column names and crosscheck.
The KeyError arises from this:
df = pd.merge(df_purchase, df_login[['count']] <- this selects just count column,
how='left', on=['age', 'gender', 'region'])
You've specifically selected just a single column from df_login
, you need this:
df = pd.merge(df_purchase, df_login,
how='left', on=['age', 'gender', 'region'])
I'm assuming that this is not your complete data as you have no common values in the age and region column in df_login
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With