Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas Python Regex : error: nothing to repeat

I have a dataframe with a couple of strange characters, "*" and "-".

import pandas as pd
import numpy as np

data = {'year': [2010, 2011, 2012, 2011, 2012, 2010, 2011, 2012],
        'team': ['Bears', 'Bears', 'Bears', 'Packers', 'Packers', 'Lions',     'Lions', 'Lions'],
        'wins': [11, '*', 10, '-', 11, 6, 10, 4],
        'losses': [5, 8, 6, 1, 5, 10, 6, 12]}
football = pd.DataFrame(data, columns=['year', 'team', 'wins', 'losses'])

I would like to replace the strange characters with '0.00' but I get an error -

error: nothing to repeat

I understand this is linked to regex but I still dont know how to overcome the issue.

the code I use to replace the characters:

football.replace(['*','-'], ['0.00','0.00'], regex=True).astype(np.float64)
like image 261
Boosted_d16 Avatar asked Dec 24 '22 22:12

Boosted_d16


2 Answers

Do

football.replace(['*','-'], ['0.00','0.00'], regex=False)

That is, there is no need to use regular expression for a simple case of matching just 1 character or another;

or if you want to use regular expression, do note that * is a special character; if you want to match values that are '*' or '-' exactly, use

football.replace('^[*-]$', '0.00', regex=True)

* is a special character in regex, you have to escape it:

football.replace(['\*','-'], ['0.00','0.00'], regex=True).astype(np.float64)

or use a character class:

football.replace([*-], '0.00', regex=True).astype(np.float64)
like image 149
Toto Avatar answered Jan 02 '23 05:01

Toto