Oftentimes I am tasked with performing some sort of replace or substitution operation on data in a Series or DataFrames column(s). For example, given a Series of strings, <pre class="prettyprint"><code>s = pd.Series(['foo', 'another foo bar', 'baz']) 0 foo 1 another foo bar 2 baz dtype: object </code></pre> The goal would be to replace all occurrences of "foo" with "bar", to get <pre class="prettyprint"><code>0 bar 1 another bar bar 2 baz Name: A, dtype: object </code></pre> At this point I am usually confused as there are two options I can use to solve this: <code>replace</code>, and <code>str.replace</code>. The confusion arises from the fact that I am unsure as to which is the right method to use, or what the difference (if any) between them is. What are the main differences between <code>replace</code> and <code>str.replace</code>, and what are the benefits/caveats of using either?

If you are comparing <code>str.replace</code> with <code>replace</code>, I would assume that you are thinking of replacing strings only. The two thumb rules that help (especially when using <code>.apply()</code> and <code>lambda</code>) are: <ol> <li>If you want to replace many things at once use <code>df.replace({dict})</code>. Remember the defaults as mentioned by <code>cs95</code> or in the docs.</li> <li>If you want to use regex AND case sensitivity options use <code>str.replace()</code>: <code>lambda x: x.str.replace('^default$', '', regex = True, case = False)</code>.</li> </ol> One final thing to note is that the <code>inplace</code> parameter is only available in the <code>replace</code> function and not in <code>str.replace</code> which may be a deciding factor in your code especially if you are chaining.

What is the difference between Series.replace and Series.str.replace?

Tags:

python

replace

pandas

Oftentimes I am tasked with performing some sort of replace or substitution operation on data in a Series or DataFrames column(s).

For example, given a Series of strings,

Click to copy

s = pd.Series(['foo', 'another foo bar', 'baz'])

0                foo
1    another foo bar
2                baz
dtype: object

The goal would be to replace all occurrences of "foo" with "bar", to get

Click to copy

0                bar
1    another bar bar
2                baz
Name: A, dtype: object

At this point I am usually confused as there are two options I can use to solve this: replace, and str.replace. The confusion arises from the fact that I am unsure as to which is the right method to use, or what the difference (if any) between them is.

What are the main differences between replace and str.replace, and what are the benefits/caveats of using either?

296

asked Jun 17 '19 05:06

cs95

2 Answers

Skip to the TLDR; at the bottom of this answer for a brief summary of the differences.

It is easy to understand the difference if you think of these two methods in terms of their utility.

.str.replace is a method with a very specific purpose—to perform string or regex substitution on string data.

OTOH, .replace is more of an all-purpose Swiss Army knife which can replace anything with anything else (and yes, this includes string and regex).

Consider the simple DataFrame below, this will form the basis of our forthcoming discussion.

Click to copy

# Setup
df = pd.DataFrame({
    'A': ['foo', 'another foo bar', 'baz'],
    'B': [0, 1, 0]
})
df

                 A  B
0              foo  0
1  another foo bar  1
2              baz  0

The main differences between the two functions can be summarised in terms of

Purpose
Usage
Default behavior

Use str.replace for substring replacements on a single string column, and replace for any general replacement on one or more columns.

The docs market str.replace as a method for "simple string replacement", so this should be your first choice when performing string/regex substitution on a pandas Series or column—think of it as a "vectorised" equivalent to python's string replace() function (or re.sub() to be more accurate).

Click to copy

# simple substring replacement
df['A'].str.replace('foo', 'bar', regex=False)

0                bar
1    another bar bar
2                baz
Name: A, dtype: object

# simple regex replacement
df['A'].str.replace('ba.', 'xyz')

0                foo
1    another foo xyz
2                xyz
Name: A, dtype: object

replace works for string as well as non-string replacement. What's more, it is also meant to **work for multiple columns at a time (you can access replace as a DataFrame method df.replace() as well, if you need to replace values across the entire DataFrame.

Click to copy

# DataFrame-wide replacement
df.replace({'foo': 'bar', 1: -1})

                 A  B
0              bar  0
1  another foo bar -1
2              baz  0

str.replace can replace one thing at a time. replace lets you perform multiple independent replacements, i.e., replace many things at once.

You can only specify a single substring or regex pattern to str.replace. repl can be a callable (see the docs), so there's room to get creative with regex to somewhat simulate multiple substring replacements, but these solutions are hacky at best).

A common pandaic (pandorable, pandonic) pattern is to use str.replace to remove multiple unwanted substrings by pipe-separating substrings using the regex OR pipe |, and the replacement string is '' (the empty string).

replace should be preferred when you have multiple independent replacements of the form {'pat1': 'repl1', 'pat2':repl2, ...}. There are various ways of specifying independent replacements (lists, Series, dicts, etc). See the documentation.

To illustrate the difference,

Click to copy

df['A'].str.replace('foo', 'text1').str.replace('bar', 'text2')

0                  text1
1    another text1 text2
2                    baz
Name: A, dtype: object

Would be better expressed as

Click to copy

df['A'].replace({'foo': 'text1', 'bar': 'text2'}, regex=True)

0                  text1
1    another text1 text2
2                    baz
Name: A, dtype: object

In the context of string operations, str.replace enables regex replacement by default. replace only performs a full match unless the regex=True switch is used.

Everything you do with str.replace, you can do with replace as well. However, it is important to note the following differences in the default behaviour of both methods.

substring replacements - str.replace will replace every occurrence of the substring, replace will only perform whole word matches by default
regex replacement - str.replace interprets the first argument as a regular expression unless you specify regex=False. replace is the exact opposite.

Contrast the difference between

Click to copy

df['A'].replace('foo', 'bar')

0                bar
1    another foo bar
2                baz
Name: A, dtype: object

And

Click to copy

df['A'].replace('foo', 'bar', regex=True)

0                bar
1    another bar bar
2                baz
Name: A, dtype: object

It is also worth mentioning that you can only perform string replacement when regex=True. So, for example, df.replace({'foo': 'bar', 1: -1}, regex=True) would be invalid.

TLDR;

To summarise, the main differences are,

Purpose. Use str.replace for substring replacements on a single string column, and replace for any general replacement on one or more columns.

Usage. str.replace can replace one thing at a time. replace lets you perform multiple independent replacements, i.e., replace many things at once.

Default behavior. str.replace enables regex replacement by default. replace only performs a full match unless the regex=True switch is used.

115

answered Oct 18 '22 19:10

cs95

If you are comparing str.replace with replace, I would assume that you are thinking of replacing strings only.

The two thumb rules that help (especially when using .apply() and lambda) are:

If you want to replace many things at once use df.replace({dict}). Remember the defaults as mentioned by cs95 or in the docs.
If you want to use regex AND case sensitivity options use str.replace(): lambda x: x.str.replace('^default$', '', regex = True, case = False).

One final thing to note is that the inplace parameter is only available in the replace function and not in str.replace which may be a deciding factor in your code especially if you are chaining.

answered Oct 18 '22 19:10

Bish

Related questions
                            
                                matplotlib: plotting histogram plot just above scatter plot
                            
                                Add a value to the end of a pandas index object
                            
                                Access AWS API Gateway with IAM roles from Python
                            
                                Error importing cv2 in python3, Anaconda
                            
                                Import Error: paho.mqtt.client not found
                            
                                Unable to get a sha256 hash of a string [duplicate]
                            
                                Get Enum name from multiple values python
                            
                                KeyError: False in pandas dataframe
                            
                                SettingWithCopyWarning using pandas apply [duplicate]
                            
                                How to compute the time difference between two time zones in python?
                            
                                Explicitly specifying test/train sets in GridSearchCV
                            
                                Pandas dropna() function not working
                            
                                Python - download entire directory from Google Cloud Storage
                            
                                Django 'TestForm' object has no attribute 'fields'
                            
                                key not found: _PYSPARK_DRIVER_CALLBACK_HOST
                            
                                Installing numpy with pip on windows 10 for python 3.7
                            
                                Removing duplicate content from a list of lists while not preserving any order
                            
                                Handling folds in Spyder
                            
                                Vertical alignment of y-axis ticks on Seaborn heatmap
                            
                                How to set window size in Selenium Chrome Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is the difference between Series.replace and Series.str.replace?

Tags:

python

replace

pandas

cs95

People also ask

2 Answers

TLDR;

cs95

Bish

Recent Activity

Donate For Us