I have 2m lines of Uk postcode data but some muppet has used double spaces in some cases and single spaces in others. I need to merge data based on the postcode so it needs to be consistent.
I can't find a simple way to do this in pandas, but it feels like there should be. Any advice?
Use the re. sub() method to replace multiple spaces with a single space, e.g. result = re. sub(' +', ' ', my_str) .
Pandas provide predefine method “pandas. Series. str. strip()” to remove the whitespace from the string.
replace() function is used to replace values in column (one value with another value on all columns). This method takes to_replace, value, inplace, limit, regex and method as parameters and returns a new DataFrame. When inplace=True is used, it replaces on existing DataFrame object and returns None value.
You might be looking for pd.Series.str.replace
:
df.postcode = df.postcode.str.replace(' ', ' ')
this should replace all multiple spaces with a single space
df.postcode = df.postcode.str.replace(' +', ' ')
remove all spaces from the start and end
df.postcode = df.postcode.str.strip()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With