I want to extract rows from column feccandid
that have a H or S as the first value:
cid amount date catcode feccandid
0 N00031317 1000 2010 B2000 H0FL19080
1 N00027464 5000 2009 B1000 H6IA01098
2 N00024875 1000 2009 A5200 S2IL08088
3 N00030957 2000 2010 J2200 S0TN04195
4 N00026591 1000 2009 F3300 S4KY06072
5 N00031317 1000 2010 B2000 P0FL19080
6 N00027464 5000 2009 B1000 P6IA01098
7 N00024875 1000 2009 A5200 S2IL08088
8 N00030957 2000 2010 J2200 H0TN04195
9 N00026591 1000 2009 F3300 H4KY06072
I am using this code:
campaign_contributions.loc[campaign_contributions['feccandid'].astype(str).str.extractall(r'^(?:S|H)')]
Returns error:
ValueError: pattern contains no capture groups
Does anyone with experience using Regex know what I am doing wrong?
Why not just use str.match
instead of extract and negate?
ie df[df['col'].str.match(r'^(S|H)')]
(I came here looking for the same answer, but the use of extract seemed odd, so I found the docs for str.ops
.
W
For something this simple, you can bypass the regex:
relevant = campaign_contributions.feccandid.str.startswith('H') | \
campaign_contributions.feccandid.str.startswith('S')
campaign_contributions[relevant]
However, if you want to use a regex, you can change this to
relevant = ~campaign_contributions['feccandid'].str.extract(r'^(S|H)').isnull()
Note that the astype
is redundant, and that extract
is enough.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With