Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace all but last occurrences of a character in a string with pandas

using Pandas to remove all but last period in a string like so:

s = pd.Series(['1.234.5','123.5','2.345.6','678.9'])
counts = s.str.count('\.')
target = counts==2
target
0     True
1    False
2     True
3    False
dtype: bool

s = s[target].str.replace('\.','',1)
s
0    1234.5
2    2345.6
dtype: object

my desired output, however, is:

0    1234.5
1    123.5
2    2345.6
3    678.9
dtype: object

The replace command along with the mask target seem to be dropping the unreplaced values and I can't see how to remedy this.

like image 413
seanysull Avatar asked Dec 14 '17 12:12

seanysull


1 Answers

Regex-based with str.replace

This regex pattern with str.replace should do nicely.

s.str.replace(r'\.(?=.*?\.)', '')

0    1234.5
1     123.5
2    2345.6
3     678.9
dtype: object

The idea is that, as long as there are more characters to replace, keep replacing. Here's a breakdown of the regular expression used.

\.     # '.'
(?=    # positive lookahead
.*?    # match anything
\.     # look for '.'
)

Fun with np.vectorize

If you want to do this using count, it isn't impossible, but it is a challenge. You can make this easier with np.vectorize. First, define a function,

def foo(r, c):
    return r.replace('.', '', c)

Vectorize it,

v = np.vectorize(foo)

Now, call the function v, passing s and the counts to replace.

pd.Series(v(s, s.str.count(r'\.') - 1))

0    1234.5
1     123.5
2    2345.6
3     678.9
dtype: object

Keep in mind that this is basically a glorified loop.


Loopy/List Comprehension

The python equivalent of vectorize would be,

r = []
for x, y in zip(s, s.str.count(r'\.') - 1):
    r.append(x.replace('.', '', y))

pd.Series(r)

0    1234.5
1     123.5
2    2345.6
3     678.9
dtype: object

Or, using a list comprehension:

pd.Series([x.replace('.', '', y) for x, y in zip(s, s.str.count(r'\.') - 1)])

0    1234.5
1     123.5
2    2345.6
3     678.9
dtype: object
like image 55
cs95 Avatar answered Nov 19 '22 11:11

cs95