Maybe I'm missing the obvious.
I have a pandas dataframe that looks like this :
id product categories
0 Silmarillion ['Book', 'Fantasy']
1 Headphones ['Electronic', 'Material']
2 Dune ['Book', 'Sci-Fi']
I'd like to use the groupby function to count the number of appearances of each element in the categories column, so here the result would be
Book 2
Fantasy 1
Electronic 1
Material 1
Sci-Fi 1
However when I try using a groupby function, pandas counts the occurrences of the entire list instead of separating its elements. I have tried multiple different ways of handling this, using tuples or splits, but this far I've been unsuccessful.
You can also call pd.value_counts
directly on a list.
You can generate the appropriate list via numpy.concatenate
, itertools.chain
, or cytoolz.concat
from cytoolz import concat
from itertools import chain
cytoolz.concat
pd.value_counts(list(concat(df.categories.values.tolist())))
itertools.chain
pd.value_counts(list(chain(*df.categories.values.tolist())))
numpy.unique
+ numpy.concatenate
u, c = np.unique(np.concatenate(df.categories.values), return_counts=True)
pd.Series(c, u)
All yield
Book 2
Electronic 1
Fantasy 1
Material 1
Sci-Fi 1
dtype: int64
time testing
You can normalize the records by stacking them then call value_counts()
:
pd.DataFrame(df['categories'].tolist()).stack().value_counts()
Out:
Book 2
Fantasy 1
Material 1
Sci-Fi 1
Electronic 1
dtype: int64
try this:
In [58]: df['categories'].apply(pd.Series).stack().value_counts()
Out[58]:
Book 2
Fantasy 1
Electronic 1
Sci-Fi 1
Material 1
dtype: int64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With