I have a dataframe where I want to remove all parentheses and stuff inside it.
I checked out : How can I remove text within parentheses with a regex?
Where the answer to remove the data was
re.sub(r'\([^)]*\)', '', filename)
I tried this as well as
re.sub(r'\(.*?\)', '', filename)
However, I got an error: expected a string or buffer
When I tried using the column df['Column Name']
I got no item named 'Column Name'
I checked the dataframe using df.head()
and it showed up as a clean table with the column names as what I wanted them to be....however when I use the re
expression to remove the (stuff) it isn't recognizing the column name that I have.
I normally use
df['name'].str.replace(" ()","")
However, I want to remove the parentheses and what is inside....How can I do this using either regex or pandas?
Thanks!
Here is the solution I used...thanks for the help!
All['Manufacturer Standard Name'] = All['Manufacturer Standard Name'].str.replace(r"\(.*\)","")
For using regex to remove parentheses from string in Python, we can use the re. sub() or pandas. str. replace() function.
Using strip() to Remove Parentheses from the Beginning and End of Strings in Python. If your parentheses are on the beginning and end of your string, you can also use the strip() function. The Python strip() function removes specified characters from the beginning and end of a string.
Use drop() method to delete rows based on column value in pandas DataFrame, as part of the data cleansing, you would be required to drop rows from the DataFrame when a column value matches with a static value or on another column value.
Add df = df. astype(float) after the replace and you've got it. I'd skip inplace and just do df = df. replace('\*', '', regex=True).
df['name'].str.replace(r"\(.*\)","")
You can't run re functions directly on pandas objects. You have to loop them for each element inside the object. So Series.str.replace((r"\(.*\)", "")
is just syntactic sugar for Series.apply(lambda x: re.sub(r"\(.*\)", "", x))
.
If you have multiple (...)
substrings in the data you should consider using either
All['Manufacturer Standard Name'] = All['Manufacturer Standard Name'].str.replace(r"\(.*?\)","")
or
All['Manufacturer Standard Name'] = All['Manufacturer Standard Name'].str.replace(r"\([^()]*\)","")
The difference is that .*?
is slower and does not match line breaks, and [^()]
matches any char but (
and )
and is quite efficient and matches line breaks. The first one will match (...(...)
but the second will only match (...)
.
If you want to normalize all whitespace after removing these substrings, you may consider
All['Manufacturer Standard Name'] = All['Manufacturer Standard Name'].str.replace(r"\s*\([^()]*\)","").str.strip()
The \s*\([^()]*\)
regex will match 0+ whitespaces and then the string between parentheses and then str.stip()
will get rid of any potential trailing whitespace.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With