I have a Dataframe with 3 columns: <pre class="prettyprint"><code>id,name,team 101,kevin, marketing 102,scott,admin\n 103,peter,finance\n </code></pre> I am trying to apply a regex function such that I remove the unnecessary spaces. I have got the code that removes these spaces how ever I am unable loop it through the entire Dataframe. This is what I have tried thus far: <pre class="prettyprint"><code>df['team'] = re.sub(r'[\n\r]*','',df['team']) </code></pre> But this throws an error <code>AttributeError: 'Series' object has no attribute 're'</code> Could anyone advice how could I loop this regex through the entire Dataframe <code>df['team']</code> column

You are almost there, there are two simple ways of doing this: <pre class="prettyprint"><code># option 1 - faster way df['team'] = [re.sub(r'[\n\r]*','', str(x)) for x in df['team']] # option 2 df['team'] = df['team'].apply(lambda x: re.sub(r'[\n\r]*','', str(x))) </code></pre>

As long it's a dataframe check replace https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.replace.html <pre class="prettyprint"><code>df['team'].replace( { r"[\n\r]+" : '' }, inplace= True, regex = True) </code></pre> Regarding the regex, '*' means 0 or more, you should need '+' which is 1 or more

Applying Regex across entire column of a Dataframe

Tags:

python

python-3.x

pandas

I have a Dataframe with 3 columns:

id,name,team 
101,kevin, marketing
102,scott,admin\n
103,peter,finance\n

I am trying to apply a regex function such that I remove the unnecessary spaces. I have got the code that removes these spaces how ever I am unable loop it through the entire Dataframe.

This is what I have tried thus far:

df['team'] = re.sub(r'[\n\r]*','',df['team'])

But this throws an error AttributeError: 'Series' object has no attribute 're'

Could anyone advice how could I loop this regex through the entire Dataframe df['team'] column

354

asked Dec 28 '18 18:12

hello kee

2 Answers

You are almost there, there are two simple ways of doing this:

# option 1 - faster way
df['team'] =  [re.sub(r'[\n\r]*','', str(x)) for x in df['team']]

# option 2
df['team'] =  df['team'].apply(lambda x: re.sub(r'[\n\r]*','', str(x)))

101

answered Sep 22 '22 07:09

YOLO

As long it's a dataframe check replace https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.replace.html

df['team'].replace( { r"[\n\r]+" : '' }, inplace= True, regex = True)

Regarding the regex, '*' means 0 or more, you should need '+' which is 1 or more

answered Sep 26 '22 07:09

josem8f

Related questions
                            
                                Python can't find 'main' module
                            
                                How delete tag from node in lxml without tail?
                            
                                Error message with nltk.sentiment.vader in Python
                            
                                Why is dill much faster and more disk-efficient than pickle for numpy arrays
                            
                                Why are these tuples returned from a function identical?
                            
                                Python 3 - Counting up with two different values
                            
                                Can I make random mask with Numpy？
                            
                                Scraping free proxy listing website
                            
                                failed to create anaconda environment ResolvePackageNotFound
                            
                                How can I get the size of a TemporaryFile in python? [duplicate]
                            
                                How to write an octal value in Python 2 & 3
                            
                                Create unique id based on date in Python
                            
                                How to instantiate class by it's string name in Python from CURRENT file? [duplicate]
                            
                                Make links clickable in my Django TextField
                            
                                How can I restart the airflow server on Google Composer?
                            
                                Tensorflow on MacOS: Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
                            
                                Skip frames and seek to end of RTSP stream in OpenCV
                            
                                PyCharm can't find Spacy Model 'en'
                            
                                Unused variable in a for loop
                            
                                Map an image onto a sphere and plot 3D trajectories

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With