i have a data-frame like below
name genre
satya |ACTION|DRAMA|IC|
satya |COMEDY|BIOPIC|SOCIAL|
abc |CLASSICAL|
xyz |ROMANCE|ACTION|DARMA|
def |DISCOVERY|SPORT|COMEDY|IC|
ghj |IC|
Now I want to query the dataframe so that i can get row 1,5 and 6.i:e i want to find |IC| with alone or with any combination of other genres.
Upto now i am able to do either a exact search using
df[df['genre'] == '|ACTION|DRAMA|IC|'] ######exact value yields row 1
or a string contains search by
df[df['genre'].str.contains('IC')] ####yields row 1,2,3,5,6
# as BIOPIC has IC in that same for CLASSICAL also
But i don't want these two.
#df[df['genre'].str.contains('|IC|')] #### row 6
# This also not satisfying my need as i am missing rows 1 and 5
So my requirement is to find genres having |IC| in them.(My string search fails because python treats '|' as or operator)
Somebody suggest some reg or any method to do that.Thanks in ADv.
I think you can add \
to regex for escaping , because |
without \
is interpreted as OR
:
'|'
A|B, where A and B can be arbitrary REs, creates a regular expression that will match either A or B. An arbitrary number of REs can be separated by the '|' in this way. This can be used inside groups (see below) as well. As the target string is scanned, REs separated by '|' are tried from left to right. When one pattern completely matches, that branch is accepted. This means that once A matches, B will not be tested further, even if it would produce a longer overall match. In other words, the '|' operator is never greedy. To match a literal '|', use \|, or enclose it inside a character class, as in [|].
print df['genre'].str.contains(u'\|IC\|')
0 True
1 False
2 False
3 False
4 True
5 True
Name: genre, dtype: bool
print df[df['genre'].str.contains(u'\|IC\|')]
name genre
0 satya |ACTION|DRAMA|IC|
4 def |DISCOVERY|SPORT|COMEDY|IC|
5 ghj |IC|
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With