I want to create a column df['score'] that returns the count of values in common between a cell and a list.
Input:
correct_list = ['cats','dogs']
  answer       
0 cats, dogs, pigs
1 cats, dogs        
2 dogs, pigs        
3 cats              
4 pigs     
def animal_count(dataframe):
    count = 0
    for term in df['answer']:
        if term in symptom_list:
            df['score'] = count + 1
animal_count(df)         
Intended Output:
correct_list = ['cats','dogs']
  answer            score
0 cats, dogs, pigs  2
1 cats, dogs        2
2 dogs, pigs        1
3 cats              1
4 pigs              0
Any ideas? Thanks!
You can extract a column of pandas DataFrame based on another value by using the DataFrame. query() method. The query() is used to query the columns of a DataFrame with a boolean expression. The blow example returns a Courses column where the Fee column value matches with 25000.
tolist() you can convert pandas DataFrame Column to List. df['Courses'] returns the DataFrame column as a Series and then use values. tolist() to convert the column values to list.
We can count by using the value_counts() method. This function is used to count the values present in the entire dataframe and also count values in a particular column.
Another solution using Series.str.count:
df['score'] = df['answer'].str.count('|'.join(correct_list))
[out]
             answer  score
0  cats, dogs, pigs      2
1        cats, dogs      2
2        dogs, pigs      1
3              cats      1
4              pigs      0
As pointed out by @PrinceFrancis, if catsdogs shouldn't be counted as 2, then you can change your regex pattern to suit:
df = pd.DataFrame({'answer': ['cats, dogs, pigs', 'cats, dogs', 'dogs, pigs', 'cats', 'pigs', 'catsdogs']})
pat = '|'.join([fr'\b{x}\b' for x in correct_list])
df['score'] = df['answer'].str.count(pat)
[out]
             answer  score
0  cats, dogs, pigs      2
1        cats, dogs      2
2        dogs, pigs      1
3              cats      1
4              pigs      0
5          catsdogs      0
                        We can also use Series.explode:
df['score']=df['answer'].str.split(', ').explode().isin(correct_list).groupby(level=0).sum()
print(df)
             answer  score
0  cats, dogs, pigs    2.0
1        cats, dogs    2.0
2        dogs, pigs    1.0
3              cats    1.0
4              pigs    0.0
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With