Count words in a column of strings in Pandas

Question

I have a pandas dataframe that contains queries and counts for a given time period and I'm hoping to turn this dataframe into a count of unique words. For example, if the dataframe contained the below:

query          count
foo bar        10
super          8 
foo            4
super foo bar  2

I'm looking to receive the below dataframe. e.g. the word 'foo' appears exactly 16 times within the table.

word    count
foo     16
bar     12
super   10

I'm working with the below function, but it hardly seems like the optimal way to do this and it ignores the total count for each row.

def _words(df):
  return Counter(re.findall(r'\w+', ' '.join(df['query'])))

Any help will be greatly appreciated.

Thanks in advance!

piRSquared · Accepted Answer

Option 1

df['query'].str.get_dummies(sep=' ').T.dot(df['count'])

bar      12
foo      16
super    10
dtype: int64

Option 2

df['query'].str.get_dummies(sep=' ').mul(df['count'], axis=0).sum()

bar      12
foo      16
super    10
dtype: int64

Option 3
numpy.bincount + pd.factorize
also highlighting the use of cytoolz.mapcat. It returns an iterator where it maps a function and concatenates the results. That's cool!

import pandas as pd, numpy as np, cytoolz

q = df['query'].values
c = df['count'].values

f, u = pd.factorize(list(cytoolz.mapcat(str.split, q.tolist())))
l = np.core.defchararray.count(q.astype(str), ' ') + 1

pd.Series(np.bincount(f, c.repeat(l)).astype(int), u)

foo      16
bar      12
super    10
dtype: int64

Option 4
Absurd use of stuff... just use option 1.

pd.DataFrame(dict(
    query=' '.join(df['query']).split(),
    count=df['count'].repeat(df['query'].str.count(' ') + 1)
)).groupby('query')['count'].sum()

query
bar      12
foo      16
super    10
Name: count, dtype: int64

Count words in a column of strings in Pandas

Tags:

python

pandas

group-by

pandas-groupby

sokeefe

1 Answers

piRSquared

Recent Activity

Donate For Us

Count words in a column of strings in Pandas

Tags:

python

pandas

group-by

pandas-groupby

sokeefe

1 Answers

piRSquared

Related questions

Recent Activity

Donate For Us