I have a series of strings and I'm trying to create a new column that counts the number of upper case words in each string, with the constraint that the word is greater than 1. For example, the series <pre class="prettyprint"><code>s = pd.Series(['I AM MAD!', 'Today is a nice day', 'This restaurant SUCKS']) </code></pre> would return a series with values of 2, 0, 1. A few other helpful questions on here have shown me one way to do this for a single string: <pre class="prettyprint"><code>sum(map(str.isupper, [word for word in s[0].split() if len(word) > 1])) </code></pre> which correctly returns 2. But I'm wondering how to apply this to the entire series without looping over each element?

Borrow Quang's regex <pre class="prettyprint"><code>s.str.count(r'(\b[A-Z]{2,}\b)') 0 2 1 0 2 1 dtype: int64 </code></pre>

How to count the number of upper case words more than 1 character long in series

Tags:

I have a series of strings and I'm trying to create a new column that counts the number of upper case words in each string, with the constraint that the word is greater than 1. For example, the series

s = pd.Series(['I AM MAD!', 'Today is a nice day', 'This restaurant SUCKS'])

would return a series with values of 2, 0, 1.

A few other helpful questions on here have shown me one way to do this for a single string:

sum(map(str.isupper, [word for word in s[0].split() if len(word) > 1]))

which correctly returns 2.

But I'm wondering how to apply this to the entire series without looping over each element?

803

asked Apr 09 '20 15:04

kcm2174

2 Answers

You can use regex to extract the words, and then count:

(s.str.extractall(r'(\b[A-Z]{2,}\b)')  # extract all capitalized words with len at least 2
  .groupby(level=0).size()             # count by each index
  .reindex(s.index, fill_value=0)      # fill the missing with 0
)

Output:

0    2
1    0
2    1
dtype: int64

143

answered Nov 14 '22 20:11

Quang Hoang

Borrow Quang's regex

s.str.count(r'(\b[A-Z]{2,}\b)')
0    2
1    0
2    1
dtype: int64

answered Nov 14 '22 18:11

BENY

Related questions
                            
                                Seaborn plot two data sets on the same scatter plot
                            
                                Django Gunicorn where logs are stored
                            
                                Recursively iterate through a nested dict and return value of the first matching key
                            
                                How do you update seaborn to latest version (v0.9)?
                            
                                How to map key to multiple values to dataframe column?
                            
                                Unpacking multiple lists and dictionaries as function arguments in Python 2
                            
                                Why is the size of npy bigger than csv?
                            
                                Is there a way to add an attribute to a function as part of the function definition?
                            
                                File corruption while writing using Pandas
                            
                                How to return with a specific status in a Python Google Cloud Function
                            
                                Jupyter notebook SSH tunnel for two step ssh tunnel
                            
                                Encounter: json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
                            
                                How to configure a tor proxy on windows?
                            
                                How to fix 'C extension not loaded, training will be slow. Install a C compiler and reinstall gensim for fast training.'
                            
                                String format printing with python3: print from unpacked array *some* of the time
                            
                                Cyclic permutation operators in python
                            
                                How to add new line to existing pandas dataframe? [duplicate]
                            
                                Can't fix "zipimport.ZipImportError: can't decompress data; zlib not available" when I type in "python3.6 get-pip.py"
                            
                                Workaround for blocked GET requests in Python
                            
                                how to check if a particular directory exists in S3 bucket using python and boto3

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to count the number of upper case words more than 1 character long in series

Tags:

python-3.x

pandas

kcm2174

People also ask

2 Answers

Quang Hoang

BENY

Recent Activity

Donate For Us