I recently found out about the <code>str</code> method for Pandas series and it's great! However if I want to chain operations (say, a couple <code>replace</code> and a <code>strip</code>) I need to keep calling <code>str</code> after every operation, making it not the most elegant code. For example, lets say my column names contain spaces and periods and I want to replace them by underscores. I might also want to strip any leftover underscores. If I wanted to do this using <code>str</code> methods, is there any way of avoiding having to run: <code>df.columns.str.replace(' ', '_').str.replace('.', '_').str.strip('_')</code> Thanks!

I think need <code>str</code> repeat for each <code>.str</code> function, it is per design. <hr> But here is possible use only one <code>replace</code>: <pre class="prettyprint"><code>df = pd.DataFrame(columns=['aa dd', 'dd.d_', 'd._']) print (df) Empty DataFrame Columns: [aa dd, dd.d_, d._] Index: [] print (df.columns.str.replace('[\s+.]', '_').str.strip('_')) Index(['aa_dd', 'dd_d', 'd'], dtype='object') </code></pre>

Why not use a list comprehension? <pre class="prettyprint"><code>import re df.columns = [re.sub('[\s.]', '_', x).strip('_') for x in df.columns] </code></pre> In a list comp, you're working with the string object directly, without the need to call <code>.str</code> each time.

Chaining string operations on Pandas Series

Tags:

python

pandas

I recently found out about the str method for Pandas series and it's great! However if I want to chain operations (say, a couple replace and a strip) I need to keep calling str after every operation, making it not the most elegant code.

For example, lets say my column names contain spaces and periods and I want to replace them by underscores. I might also want to strip any leftover underscores. If I wanted to do this using str methods, is there any way of avoiding having to run:

df.columns.str.replace(' ', '_').str.replace('.', '_').str.strip('_')

Thanks!

501

asked Nov 16 '17 15:11

tomasn4a

Video Answer

2 Answers

I think need str repeat for each .str function, it is per design.

But here is possible use only one replace:

df = pd.DataFrame(columns=['aa dd', 'dd.d_', 'd._'])

print (df)
Empty DataFrame
Columns: [aa dd, dd.d_, d._]
Index: []

print (df.columns.str.replace('[\s+.]', '_').str.strip('_'))
Index(['aa_dd', 'dd_d', 'd'], dtype='object')

answered Oct 16 '22 12:10

jezrael

Why not use a list comprehension?

import re
df.columns = [re.sub('[\s.]', '_', x).strip('_') for x in df.columns]

In a list comp, you're working with the string object directly, without the need to call .str each time.

answered Oct 16 '22 13:10

cs95

Related questions
                            
                                OpenCV python canny Required argument 'threshold2' (pos 4) not found
                            
                                python looping and creating new dataframe for each value of a column
                            
                                Overlapping axis tick labels in logarithmic plots
                            
                                How to install regular python (via homebrew) and miniconda in the same computer?
                            
                                python , opencv, image array to binary
                            
                                Django Rest Framework - OPTIONS request - Get foreign key choices
                            
                                Any limitations on platform constraints for wheels on PyPI?
                            
                                Is there a callable equivalent to f-string syntax?
                            
                                Poisson Regression in xgboost Fails for Low Frequencies
                            
                                Populate second dropdown based on the value selected in the first dropdown in flask using ajax and jQuery
                            
                                Google PubSub python client returning StatusCode.UNAVAILABLE
                            
                                How do you ensure a Celery chord callback gets called with failed subtasks?
                            
                                Set the HTTP status text in a Flask response
                            
                                Element disappears when I add an {% include %} tag inside my for loop
                            
                                URL path parameters vs query parameters in Django
                            
                                Python Error When Installing ez_setup.py "could not create SSL/TLS secure channel"
                            
                                Not clicking all tabs and not looping once issues
                            
                                Pygame - Loading images in sprites
                            
                                Matplotlib path.contains_points returns false for points on some edges but not others
                            
                                Pandas manipulating a DataFrame inplace vs not inplace (inplace=True vs False) [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With