Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I count the total number of words in a Pandas dataframe cell and add those to a new column?

A common task in sentiment analysis is to obtain the count of words within a Pandas data frame cell and create a new column based on that count. How do I do this?

like image 797
muninn Avatar asked Sep 26 '17 14:09

muninn


People also ask

How do you count values in a Dataframe in pandas?

Count Values in Pandas Dataframe Step 1: . Importing libraries. Step 2: . Step 3: . In this step, we just simply use the .count () function to count all the values of different columns. Step 4: . If we want to count all the values with respect to row then we have to pass axis=1 or ‘columns’. Step ...

How to count the number of unique values in a Dataframe?

You can use the nunique () function to count the number of unique values in a pandas DataFrame. #count unique values in each column df.nunique() #count unique values in each row df.nunique(axis=1)

How do I get the number of rows and columns in pandas?

The Pandas .shape attribute can be used to return a tuple that contains the number of rows and columns, in the following format (rows, columns). If you’re only interested in the number of rows (say, for a condition in a for loop ), you can get the first index of that tuple.

How do you find the number of rows in a Dataframe?

The safest way to determine the number of rows in a dataframe is to count the length of the dataframe’s index. To return the length of the index, write the following code: The Pandas .shape attribute can be used to return a tuple that contains the number of rows and columns, in the following format (rows, columns).


1 Answers

Assuming that a sentence with n words has n-1 spaces in it, there's another solution:

df['new_column'] = df['count_column'].str.count(' ') + 1

This solution is probably faster, because it does not split each string into a list.

If count_column contains empty strings, the result needs to be adjusted (see comment below):

df['new_column'] = np.where(df['count_column'] == '', 0, df['new_column'])
like image 68
altabq Avatar answered Nov 14 '22 23:11

altabq