Working from a pandas dataframe trying to sanitize a column from something like $12,342
to 12342
and make the column into an int or float. Found one row though with 736[4]
so I have to remove everything within the square brackets, brackets included.
Code so far
df2['Average Monthly Wage $'] = df2['Average Monthly Wage $'].str.replace('$','')
df2['Average Monthly Wage $'] = df2['Average Monthly Wage $'].str.replace(',','')
df2['Average Monthly Wage $'] = df2['Average Monthly Wage $'].str.replace(' ','')
The line below is what's supposed to handle and remove the square brackets and intentionally with it's content too.
df2['Average Monthly Wage $'] = df2['Average Monthly Wage $'].str.replace(r'[[^]]*\)','')
To some dev's this is trivial but I've not really used regular expressions often enough to know this and I've also checked around and from one such stack example formulated the above.
I think you need:
df2 = pd.DataFrame({'Average Monthly Wage $': ['736[4]','7336[445]', '[4]345[5]']})
print (df2)
Average Monthly Wage $
0 736[4]
1 7336[445]
2 [4]345[5]
df2['Average Monthly Wage $'] = df2['Average Monthly Wage $'].str.replace(r'\[.*?\]','')
print (df2)
Average Monthly Wage $
0 736
1 7336
2 345
regex101
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With