Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count individual words in Pandas data frame

I'm trying to count the individual words in a column of my data frame. It looks like this. In reality the texts are Tweets.

text
this is some text that I want to count
That's all I wan't
It is unicode text

So what I found from other stackoverflow questions is that I could use the following:

Count most frequent 100 words from sentences in Dataframe Pandas

Count distinct words from a Pandas Data Frame

My df is called result and this is my code:

from collections import Counter
result2 = Counter(" ".join(result['text'].values.tolist()).split(" ")).items()
result2

I get the following error:

TypeError                                 Traceback (most recent call last)
<ipython-input-6-2f018a9f912d> in <module>()
      1 from collections import Counter
----> 2 result2 = Counter(" ".join(result['text'].values.tolist()).split(" ")).items()
      3 result2
TypeError: sequence item 25831: expected str instance, float found

The dtype of text is object, which from what I understand is correct for unicode text data.

like image 945
Lam Avatar asked Oct 20 '15 16:10

Lam


1 Answers

The issue is occurring because some of the values in your series (result['text']) is of type float. If you want to consider them during ' '.join() as well, then you would need to convert the floats to string before passing them onto str.join().

You can use Series.astype() to convert all the values to string. Also, you really do not need to use .tolist() , you can simply give the series to str.join() as well. Example -

result2 = Counter(" ".join(result['text'].astype(str)).split(" ")).items()

Demo -

In [60]: df = pd.DataFrame([['blah'],['asd'],[10.1]],columns=['A'])

In [61]: df
Out[61]:
      A
0  blah
1   asd
2  10.1

In [62]: ' '.join(df['A'])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-62-77e78c2ee142> in <module>()
----> 1 ' '.join(df['A'])

TypeError: sequence item 2: expected str instance, float found

In [63]: ' '.join(df['A'].astype(str))
Out[63]: 'blah asd 10.1'
like image 52
Anand S Kumar Avatar answered Sep 18 '22 17:09

Anand S Kumar