Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I create a column with counts between a column and a list in Pandas?

Tags:

python

pandas

I want to create a column df['score'] that returns the count of values in common between a cell and a list.

Input:

correct_list = ['cats','dogs']
  answer       
0 cats, dogs, pigs
1 cats, dogs        
2 dogs, pigs        
3 cats              
4 pigs     

def animal_count(dataframe):
    count = 0
    for term in df['answer']:
        if term in symptom_list:
            df['score'] = count + 1

animal_count(df)         

Intended Output:

correct_list = ['cats','dogs']

  answer            score
0 cats, dogs, pigs  2
1 cats, dogs        2
2 dogs, pigs        1
3 cats              1
4 pigs              0

Any ideas? Thanks!

like image 706
Mike Avatar asked Dec 16 '19 17:12

Mike


People also ask

How do you get the value of a column based on another column pandas?

You can extract a column of pandas DataFrame based on another value by using the DataFrame. query() method. The query() is used to query the columns of a DataFrame with a boolean expression. The blow example returns a Courses column where the Fee column value matches with 25000.

How do I create a column from a list in pandas?

tolist() you can convert pandas DataFrame Column to List. df['Courses'] returns the DataFrame column as a Series and then use values. tolist() to convert the column values to list.

How do you count occurrences of values in a column in pandas?

We can count by using the value_counts() method. This function is used to count the values present in the entire dataframe and also count values in a particular column.


2 Answers

Another solution using Series.str.count:

df['score'] = df['answer'].str.count('|'.join(correct_list))

[out]

             answer  score
0  cats, dogs, pigs      2
1        cats, dogs      2
2        dogs, pigs      1
3              cats      1
4              pigs      0

Update

As pointed out by @PrinceFrancis, if catsdogs shouldn't be counted as 2, then you can change your regex pattern to suit:

df = pd.DataFrame({'answer': ['cats, dogs, pigs', 'cats, dogs', 'dogs, pigs', 'cats', 'pigs', 'catsdogs']})

pat = '|'.join([fr'\b{x}\b' for x in correct_list])
df['score'] = df['answer'].str.count(pat)

[out]

             answer  score
0  cats, dogs, pigs      2
1        cats, dogs      2
2        dogs, pigs      1
3              cats      1
4              pigs      0
5          catsdogs      0
like image 157
Chris Adams Avatar answered Oct 11 '22 12:10

Chris Adams


We can also use Series.explode:

df['score']=df['answer'].str.split(', ').explode().isin(correct_list).groupby(level=0).sum()
print(df)
             answer  score
0  cats, dogs, pigs    2.0
1        cats, dogs    2.0
2        dogs, pigs    1.0
3              cats    1.0
4              pigs    0.0
like image 38
ansev Avatar answered Oct 11 '22 10:10

ansev