Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: select rows from columns using Regex

Tags:

regex

pandas

I want to extract rows from column feccandid that have a H or S as the first value:

    cid     amount  date    catcode     feccandid
0   N00031317   1000    2010    B2000   H0FL19080
1   N00027464   5000    2009    B1000   H6IA01098
2   N00024875   1000    2009    A5200   S2IL08088
3   N00030957   2000    2010    J2200   S0TN04195
4   N00026591   1000    2009    F3300   S4KY06072
5   N00031317   1000    2010    B2000   P0FL19080
6   N00027464   5000    2009    B1000   P6IA01098
7   N00024875   1000    2009    A5200   S2IL08088
8   N00030957   2000    2010    J2200   H0TN04195
9   N00026591   1000    2009    F3300   H4KY06072

I am using this code:

campaign_contributions.loc[campaign_contributions['feccandid'].astype(str).str.extractall(r'^(?:S|H)')]

Returns error: ValueError: pattern contains no capture groups

Does anyone with experience using Regex know what I am doing wrong?

like image 256
Collective Action Avatar asked Mar 11 '23 14:03

Collective Action


2 Answers

Why not just use str.match instead of extract and negate?

ie df[df['col'].str.match(r'^(S|H)')]

(I came here looking for the same answer, but the use of extract seemed odd, so I found the docs for str.ops.

W

like image 50
W D Avatar answered Mar 20 '23 04:03

W D


For something this simple, you can bypass the regex:

relevant = campaign_contributions.feccandid.str.startswith('H') | \
    campaign_contributions.feccandid.str.startswith('S')
campaign_contributions[relevant]

However, if you want to use a regex, you can change this to

relevant = ~campaign_contributions['feccandid'].str.extract(r'^(S|H)').isnull()

Note that the astype is redundant, and that extract is enough.

like image 41
Ami Tavory Avatar answered Mar 20 '23 03:03

Ami Tavory