I am new to Python, so its probable that I am just not wording this properly to find the answer.
Using Pandas I was able to find the most frequent N words for every record in the description field of my data. However, I have two columns; a categorical column and the description field. How to I find the most common word per category?
Ex Data:
- Property|Description
- House| Blue, Two stories, pool
- Car| Green, Dented, Manual, New
- Car| Blue, Automatic, Heated Seat
- House|New, Furnished, HOA
- Car|Blue, Old, Multiple Owners
My current code will return Blue=3, New=2 etc. But what I need to know is that Blue appeared for the word Car twice and once for House.
Current relevant code
words = (data.Description.str.lower().str.cat(sep=' ').split())
keywords=pandas.DataFrame(Counter(words).most_common(10), columns=['Words', 'Frequency'])
Data
df=pd.DataFrame({'Property':['House','Car','Car','House','Car'],'Description':['Blue,Two stories,pool','Green,Dented,Manual,New','Blue,Automatic,Heated Seat','Blue,Furnished,HOA','Blue,Old,Multiple Owners']})
Chained solution df.assign(words=df.Description.str.lower().str.split(',')).explode('words').groupby('Property')['words'].value_counts()
Explanation with a breakdown
#Create list
df['words'] = df.Description.str.lower().str.split(',')
#Explode and count
df=df.explode('words').groupby('Property')['words'].value_counts()
Property words
Car blue 2
automatic 1
dented 1
green 1
heated seat 1
manual 1
multiple owners 1
new 1
old 1
House blue 2
furnished 1
hoa 1
pool 1
two stories 1
Name: words, dtype: int64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With