Replace all but last occurrences of a character in a string with pandas

Question

using Pandas to remove all but last period in a string like so:

s = pd.Series(['1.234.5','123.5','2.345.6','678.9'])
counts = s.str.count('\.')
target = counts==2
target
0     True
1    False
2     True
3    False
dtype: bool

s = s[target].str.replace('\.','',1)
s
0    1234.5
2    2345.6
dtype: object

my desired output, however, is:

0    1234.5
1    123.5
2    2345.6
3    678.9
dtype: object

The replace command along with the mask target seem to be dropping the unreplaced values and I can't see how to remedy this.

cs95 · Accepted Answer

Regex-based with `str.replace`

This regex pattern with str.replace should do nicely.

s.str.replace(r'\.(?=.*?\.)', '')

0    1234.5
1     123.5
2    2345.6
3     678.9
dtype: object

The idea is that, as long as there are more characters to replace, keep replacing. Here's a breakdown of the regular expression used.

\.     # '.'
(?=    # positive lookahead
.*?    # match anything
\.     # look for '.'
)

Fun with `np.vectorize`

If you want to do this using count, it isn't impossible, but it is a challenge. You can make this easier with np.vectorize. First, define a function,

def foo(r, c):
    return r.replace('.', '', c)

Vectorize it,

v = np.vectorize(foo)

Now, call the function v, passing s and the counts to replace.

pd.Series(v(s, s.str.count(r'\.') - 1))

0    1234.5
1     123.5
2    2345.6
3     678.9
dtype: object

Keep in mind that this is basically a glorified loop.

Loopy/List Comprehension

The python equivalent of vectorize would be,

r = []
for x, y in zip(s, s.str.count(r'\.') - 1):
    r.append(x.replace('.', '', y))

pd.Series(r)

0    1234.5
1     123.5
2    2345.6
3     678.9
dtype: object

Or, using a list comprehension:

pd.Series([x.replace('.', '', y) for x, y in zip(s, s.str.count(r'\.') - 1)])

0    1234.5
1     123.5
2    2345.6
3     678.9
dtype: object

Replace all but last occurrences of a character in a string with pandas

Tags:

python

string

regex

pandas

seanysull

1 Answers

Regex-based with `str.replace`

Fun with `np.vectorize`

Loopy/List Comprehension

cs95

Recent Activity

Donate For Us

Replace all but last occurrences of a character in a string with pandas

Tags:

python

string

regex

pandas

seanysull

1 Answers

Regex-based with str.replace

Fun with np.vectorize

Loopy/List Comprehension

cs95

Related questions

Recent Activity

Donate For Us

Regex-based with `str.replace`

Fun with `np.vectorize`