Below is my DF
df= pd.DataFrame({'col1': ['[7]', '[30]', '[0]', '[7]'], 'col2': ['[0%, 7%]', '[30%]', '[30%, 7%]', '[7%]']})
col1    col2    
[7]     [0%, 7%]
[30]    [30%]
[0]     [30%, 7%]
[7]     [7%]
The aim is to check if col1 value is contained in col2 below is what I've tried
df['test'] = df.apply(lambda x: str(x.col1) in str(x.col2), axis=1)
Below is the expected output
col1    col2       col3
[7]     [0%, 7%]   True
[30]    [30%]      True
[0]     [30%, 7%]  False
[7]     [7%]       True
                You can check if a column contains/exists a particular value (string/int), list of multiple values in pandas DataFrame by using pd. series() , in operator, pandas. series. isin() , str.
As you work with values captured in pandas Series and DataFrames, you can use if-else statements and their logical structure to categorize and manipulate your data to reveal new insights.
You can also replace the square brackets with word boundaries \b and use re.search like in
import re
#...
df.apply(lambda x: bool(re.search(x['col1'].replace("[",r"\b").replace("]",r"\b"), x['col2'])), axis=1)
# => 0     True
#    1     True
#    2    False
#    3     True
#    dtype: bool
This will work because \b7\b will find a match in [0%, 7%] as 7 is neither preceded nor followed with letters, digits or underscores. There won't be any match found in [30%, 7%] as \b0\b does not match a zero after a digit (here, 3).
You can extract the numbers on both columns and join, then check if there is at least one match per id using eval+groupby+any:
(df['col2'].str.extractall('(?P<col2>\d+)').droplevel(1)
   .join(df['col1'].str[1:-1])
   .eval('col2 == col1')
   .groupby(level=0).any()
)
output:
0     True
1     True
2    False
3     True
                        One approach:
import ast
# convert to integer list
col2_lst = df["col2"].str.replace("%", "").apply(ast.literal_eval)
# check list containment
df["col3"] = [all(bi in a for bi in b)  for a, b in zip(col2_lst, df["col1"].apply( ast.literal_eval)) ]
print(df)
Output
   col1       col2   col3
0   [7]   [0%, 7%]   True
1  [30]      [30%]   True
2   [0]  [30%, 7%]  False
3   [7]       [7%]   True
                        Use Series.str.extractall for get numbers, reshape by Series.unstack, so possible compare by DataFrame.isin with DataFrame.any:
df['test'] = (df['col2'].str.extractall('(\d+)')[0].unstack()
                        .isin(df['col1'].str.strip('[]'))
                        .any(axis=1))
print (df)
   col1       col2   test
0   [7]   [0%, 7%]   True
1  [30]      [30%]   True
2   [0]  [30%, 7%]  False
3   [7]       [7%]   True
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With