I'm trying to create a new column in a DataFrame that contains the word count for the respective row. I'm looking for the total number of words, not frequencies of each distinct word. I assumed there would be a simple/quick way to do this common task, but after googling around and reading a handful of SO posts (1, 2, 3, 4) I'm stuck. I've tried the solutions put forward in the linked SO posts, but got lots of attribute errors back. <pre class="prettyprint"><code>words = df['col'].split() df['totalwords'] = len(words) </code></pre> results in <pre class="prettyprint"><code>AttributeError: 'Series' object has no attribute 'split' </code></pre> and <pre class="prettyprint"><code>f = lambda x: len(x["col"].split()) -1 df['totalwords'] = df.apply(f, axis=1) </code></pre> results in <pre class="prettyprint"><code>AttributeError: ("'list' object has no attribute 'split'", 'occurred at index 0') </code></pre>

Here is a way using <code>.apply()</code>: <pre class="prettyprint"><code>df['number_of_words'] = df.col.apply(lambda x: len(x.split())) </code></pre> example Given this <code>df</code>: <pre class="prettyprint"><code>>>> df col 0 This is one sentence 1 and another </code></pre> After applying the <code>.apply()</code> <pre class="prettyprint"><code>df['number_of_words'] = df.col.apply(lambda x: len(x.split())) >>> df col number_of_words 0 This is one sentence 4 1 and another 2 </code></pre> Note: As pointed out by in comments, and in this answer, <code>.apply</code> is not necessarily the fastest method. If speed is important, better go with one of @cᴏʟᴅsᴘᴇᴇᴅ's methods.

Count number of words per row

Tags:

python

string

python-3.x

pandas

dataframe

I'm trying to create a new column in a DataFrame that contains the word count for the respective row. I'm looking for the total number of words, not frequencies of each distinct word. I assumed there would be a simple/quick way to do this common task, but after googling around and reading a handful of SO posts (1, 2, 3, 4) I'm stuck. I've tried the solutions put forward in the linked SO posts, but got lots of attribute errors back.

words = df['col'].split() df['totalwords'] = len(words)

results in

AttributeError: 'Series' object has no attribute 'split'

and

f = lambda x: len(x["col"].split()) -1 df['totalwords'] = df.apply(f, axis=1)

results in

AttributeError: ("'list' object has no attribute 'split'", 'occurred at index 0')

898

asked Apr 23 '18 15:04

LMGagne

2 Answers

`str.split` + `str.len`

str.len works nicely for any non-numeric column.

df['totalwords'] = df['col'].str.split().str.len()

`str.count`

If your words are single-space separated, you may simply count the spaces plus 1.

df['totalwords'] = df['col'].str.count(' ') + 1

List Comprehension

This is faster than you think!

df['totalwords'] = [len(x.split()) for x in df['col'].tolist()]

answered Sep 18 '22 19:09

cs95

Here is a way using .apply():

df['number_of_words'] = df.col.apply(lambda x: len(x.split()))

example

Given this df:

>>> df                     col 0  This is one sentence 1           and another

After applying the .apply()

df['number_of_words'] = df.col.apply(lambda x: len(x.split()))  >>> df                     col  number_of_words 0  This is one sentence                4 1           and another                2

Note: As pointed out by in comments, and in this answer, .apply is not necessarily the fastest method. If speed is important, better go with one of @cᴏʟᴅsᴘᴇᴇᴅ's methods.

answered Sep 19 '22 19:09

sacuL

Related questions
                            
                                How to change Tor identity in Python?
                            
                                Is it possible to multiprocess a function that returns something in Python?
                            
                                Jinja 2 safe keyword
                            
                                JSON serializing Mongodb
                            
                                Pandas deleting row with df.drop doesn't work
                            
                                Installing TensorFlow on Windows (Python 3.6.x)
                            
                                How to plot a 2d matrix in python with colorbar? (like imagesc in Matlab)
                            
                                Pycharm gets error "can't find '__main__' module"
                            
                                How to synchronize a python dict with multiprocessing
                            
                                argparse module not working in Python
                            
                                How to convert the output of meshgrid to the corresponding array of points?
                            
                                How to show query parameter options in Django REST Framework - Swagger
                            
                                Python merging two lists with all possible permutations
                            
                                Using SQLAlchemy session from Flask raises "SQLite objects created in a thread can only be used in that same thread"
                            
                                How to format seaborn/matplotlib axis tick labels from number to thousands or Millions? (125,436 to 125.4K)
                            
                                Why can I not catch a Queue.Empty exception from a multiprocessing Queue?
                            
                                Getting exception details in Python
                            
                                Python check if list items are integers? [duplicate]
                            
                                Adding y=x to a matplotlib scatter plot if I haven't kept track of all the data points that went in
                            
                                Round down datetime to previous hour

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Count number of words per row

Tags:

python

string

python-3.x

pandas

dataframe

LMGagne

People also ask

2 Answers

`str.split` + `str.len`

`str.count`

List Comprehension

cs95

sacuL

Recent Activity

Donate For Us

Count number of words per row

Tags:

python

string

python-3.x

pandas

dataframe

LMGagne

People also ask

2 Answers

str.split + str.len

str.count

List Comprehension

cs95

sacuL

Related questions

Recent Activity

Donate For Us

`str.split` + `str.len`

`str.count`