Below is my DF
df= pd.DataFrame({'col1': ['[7]', '[30]', '[0]', '[7]'], 'col2': ['[0%, 7%]', '[30%]', '[30%, 7%]', '[7%]']})
col1 col2
[7] [0%, 7%]
[30] [30%]
[0] [30%, 7%]
[7] [7%]
The aim is to check if col1 value is contained in col2 below is what I've tried
df['test'] = df.apply(lambda x: str(x.col1) in str(x.col2), axis=1)
Below is the expected output
col1 col2 col3
[7] [0%, 7%] True
[30] [30%] True
[0] [30%, 7%] False
[7] [7%] True
You can check if a column contains/exists a particular value (string/int), list of multiple values in pandas DataFrame by using pd. series() , in operator, pandas. series. isin() , str.
As you work with values captured in pandas Series and DataFrames, you can use if-else statements and their logical structure to categorize and manipulate your data to reveal new insights.
You can also replace the square brackets with word boundaries \b
and use re.search
like in
import re
#...
df.apply(lambda x: bool(re.search(x['col1'].replace("[",r"\b").replace("]",r"\b"), x['col2'])), axis=1)
# => 0 True
# 1 True
# 2 False
# 3 True
# dtype: bool
This will work because \b7\b
will find a match in [0%, 7%]
as 7
is neither preceded nor followed with letters, digits or underscores. There won't be any match found in [30%, 7%]
as \b0\b
does not match a zero after a digit (here, 3
).
You can extract the numbers on both columns and join
, then check if there is at least one match per id using eval
+groupby
+any
:
(df['col2'].str.extractall('(?P<col2>\d+)').droplevel(1)
.join(df['col1'].str[1:-1])
.eval('col2 == col1')
.groupby(level=0).any()
)
output:
0 True
1 True
2 False
3 True
One approach:
import ast
# convert to integer list
col2_lst = df["col2"].str.replace("%", "").apply(ast.literal_eval)
# check list containment
df["col3"] = [all(bi in a for bi in b) for a, b in zip(col2_lst, df["col1"].apply( ast.literal_eval)) ]
print(df)
Output
col1 col2 col3
0 [7] [0%, 7%] True
1 [30] [30%] True
2 [0] [30%, 7%] False
3 [7] [7%] True
Use Series.str.extractall
for get numbers, reshape by Series.unstack
, so possible compare by DataFrame.isin
with DataFrame.any
:
df['test'] = (df['col2'].str.extractall('(\d+)')[0].unstack()
.isin(df['col1'].str.strip('[]'))
.any(axis=1))
print (df)
col1 col2 test
0 [7] [0%, 7%] True
1 [30] [30%] True
2 [0] [30%, 7%] False
3 [7] [7%] True
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With