Pandas: select rows from columns using Regex

Question

I want to extract rows from column feccandid that have a H or S as the first value:

    cid     amount  date    catcode     feccandid
0   N00031317   1000    2010    B2000   H0FL19080
1   N00027464   5000    2009    B1000   H6IA01098
2   N00024875   1000    2009    A5200   S2IL08088
3   N00030957   2000    2010    J2200   S0TN04195
4   N00026591   1000    2009    F3300   S4KY06072
5   N00031317   1000    2010    B2000   P0FL19080
6   N00027464   5000    2009    B1000   P6IA01098
7   N00024875   1000    2009    A5200   S2IL08088
8   N00030957   2000    2010    J2200   H0TN04195
9   N00026591   1000    2009    F3300   H4KY06072

I am using this code:

campaign_contributions.loc[campaign_contributions['feccandid'].astype(str).str.extractall(r'^(?:S|H)')]

Returns error: ValueError: pattern contains no capture groups

Does anyone with experience using Regex know what I am doing wrong?

W D · Accepted Answer

Why not just use str.match instead of extract and negate?

ie df[df['col'].str.match(r'^(S|H)')]

(I came here looking for the same answer, but the use of extract seemed odd, so I found the docs for str.ops.

W

Ami Tavory · Answer

For something this simple, you can bypass the regex:

relevant = campaign_contributions.feccandid.str.startswith('H') | \
    campaign_contributions.feccandid.str.startswith('S')
campaign_contributions[relevant]

However, if you want to use a regex, you can change this to

relevant = ~campaign_contributions['feccandid'].str.extract(r'^(S|H)').isnull()

Note that the astype is redundant, and that extract is enough.

Pandas: select rows from columns using Regex

Tags:

regex

pandas

Collective Action

2 Answers

W D

Ami Tavory

Recent Activity

Donate For Us

Pandas: select rows from columns using Regex

Tags:

regex

pandas

Collective Action

2 Answers

W D

Ami Tavory

Related questions

Recent Activity

Donate For Us