I am struggling with the easiest way to do a case insensitive merge in pandas. Is there a way to do it right on the merge? Do I need to use (?i) or a regex with ignorecase? In my code snippet below I am joining some Countries where it may be "United States" in one file and "UNITED STATES" in another and I just want to take the case out of the equation. Thank you! <pre class="prettyprint"><code>import pandas as pd import csv import sys env_path = sys.argv[1] map_path = sys.argv[2] df_address = pd.read_csv(env_path + "\\address.csv") df_CountryMapping = pd.read_csv(map_path + "\CountryMapping.csv") df_merged = df_address.merge(df_CountryMapping, left_on="Country", right_on="NAME", how="left") .... </code></pre>

I suggest lowering the column names after reading them <pre class="prettyprint"><code>df_address.columns=[c.lower() for c in df_address.columns] df_CountryMapping.columns=[c.lower() for c in df_CountryMapping.columns] </code></pre> Then update the values <pre class="prettyprint"><code>df_address['country']=df_address['country'].str.lower() df_CountryMapping['name']=df_CountryMapping['name'].str.lower() </code></pre> And only then, do the merging <pre class="prettyprint"><code>df_merged = df_address.merge(df_CountryMapping, left_on="country", right_on="name", how="left") </code></pre>

Case insensitive pandas dataframe.merge

Tags:

python

pandas

csv

I am struggling with the easiest way to do a case insensitive merge in pandas. Is there a way to do it right on the merge? Do I need to use (?i) or a regex with ignorecase? In my code snippet below I am joining some Countries where it may be "United States" in one file and "UNITED STATES" in another and I just want to take the case out of the equation. Thank you!

import pandas as pd
import csv
import sys

env_path = sys.argv[1]
map_path = sys.argv[2]


df_address = pd.read_csv(env_path + "\\address.csv")
df_CountryMapping = pd.read_csv(map_path + "\CountryMapping.csv")

df_merged = df_address.merge(df_CountryMapping, left_on="Country", right_on="NAME", how="left")

....

950

asked Apr 21 '15 02:04

EMC

4 Answers

Lowercase the values in the two columns that will be used to merge, and then merge on the lowercased columns

df_address['country_lower'] = df_address['Country'].str.lower()
df_CountryMapping['name_lower'] = df_CountryMapping['NAME'].str.lower()
df_merged = df_address.merge(df_CountryMapping, left_on="country_lower", right_on="name_lower", how="left")

188

answered Oct 15 '22 05:10

Shashank Agarwal

df_merged = pd.merge(df_address, df_CountryMapping, left_on=df_address["Country"].str.lower(), right_on=df_CountryMapping["NAME"].str.lower(), how="left")

answered Oct 15 '22 05:10

dattatreya moganti

I suggest lowering the column names after reading them

df_address.columns=[c.lower() for c in df_address.columns]
df_CountryMapping.columns=[c.lower() for c in df_CountryMapping.columns]

Then update the values

df_address['country']=df_address['country'].str.lower()
df_CountryMapping['name']=df_CountryMapping['name'].str.lower()

And only then, do the merging

df_merged = df_address.merge(df_CountryMapping, left_on="country", right_on="name", how="left")

answered Oct 15 '22 05:10

Uri Goren

One solution would be to convert the column names of both data frames to be all lowercase. So something like this:

df_address = pd.read_csv(env_path + "\\address.csv")
df_CountryMapping = pd.read_csv(map_path + "\CountryMapping.csv")

df_address.rename(columns=lambda x: x.lower(), inplace=True)
df_CountryMapping.rename(columns=lambda x: x.lower(), inplace=True)

df_merged = df_address.merge(df_CountryMapping, left_on="country", right_on="name", how="left")

answered Oct 15 '22 04:10

mway

Related questions
                            
                                Pandas round is not working for DataFrame
                            
                                Python Gmail API 'not JSON serializable'
                            
                                Tensorflow Deep MNIST: Resource exhausted: OOM when allocating tensor with shape[10000,32,28,28]
                            
                                How to get parent folder name of current directory?
                            
                                How to remove special characters except space from a file in python?
                            
                                Install PyTorch from requirements.txt
                            
                                How can I parse HTML with html5lib, and query the parsed HTML with XPath?
                            
                                Python list comprehension overriding value
                            
                                Decorator that prints function call details (parameters names and effective values)?
                            
                                How to run sudo with Paramiko? (Python)
                            
                                Get IP address of url in python? [duplicate]
                            
                                Run Python Script on Selected File
                            
                                Filtering out certain bytes in python
                            
                                Scrapy:How to print request referrer
                            
                                Python generator objects: __sizeof__()
                            
                                Copy list and append an element in one line
                            
                                Python pandas to_excel 'utf8' codec can't decode byte
                            
                                Is there a way to construct an object using PyYAML construct_mapping after all nodes complete loading?
                            
                                Sorting a List by frequency of occurrence in a list
                            
                                Python: Creating a 2D histogram from a numpy matrix

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Case insensitive pandas dataframe.merge

Tags:

python

pandas

csv

EMC

People also ask

4 Answers

Shashank Agarwal

dattatreya moganti

Uri Goren

mway

Recent Activity

Donate For Us