Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas ValueError: pattern contains no capture groups

Tags:

python

pandas

When using regular expression, I get:

import re
string = r'http://www.example.com/abc.html'
result = re.search('^.*com', string).group()

In pandas, I write:

df = pd.DataFrame(columns = ['index', 'url'])
df.loc[len(df), :] = [1, 'http://www.example.com/abc.html']
df.loc[len(df), :] = [2, 'http://www.hello.com/def.html']
df.str.extract('^.*com')

ValueError: pattern contains no capture groups

How to solve the problem?

Thanks.

like image 913
Chan Avatar asked Jan 24 '19 09:01

Chan


2 Answers

According to the docs, you need to specify a capture group (i.e., parentheses) for str.extract to, well, extract.

Series.str.extract(pat, flags=0, expand=True)
For each subject string in the Series, extract groups from the first match of regular expression pat.

Each capture group constitutes its own column in the output.

df.url.str.extract(r'(.*.com)')

                        0
0  http://www.example.com
1    http://www.hello.com

# If you need named capture groups,
df.url.str.extract(r'(?P<URL>.*.com)')

                      URL
0  http://www.example.com
1    http://www.hello.com

Or, if you need a Series,

df.url.str.extract(r'(.*.com)', expand=False)

0    http://www.example.com
1      http://www.hello.com
Name: url, dtype: object
like image 167
cs95 Avatar answered Nov 17 '22 19:11

cs95


You need specify column url with () for match groups:

df['new'] = df['url'].str.extract(r'(^.*com)')
print (df)
  index                              url                     new
0     1  http://www.example.com/abc.html  http://www.example.com
1     2    http://www.hello.com/def.html    http://www.hello.com
like image 8
jezrael Avatar answered Nov 17 '22 21:11

jezrael