I have a series of strings and I'm trying to create a new column that counts the number of upper case words in each string, with the constraint that the word is greater than 1. For example, the series
s = pd.Series(['I AM MAD!', 'Today is a nice day', 'This restaurant SUCKS'])
would return a series with values of 2, 0, 1.
A few other helpful questions on here have shown me one way to do this for a single string:
sum(map(str.isupper, [word for word in s[0].split() if len(word) > 1]))
which correctly returns 2.
But I'm wondering how to apply this to the entire series without looping over each element?
Initialize the two count variables to 0. 3. Use a for loop to traverse through the characters in the string and increment the first count variable each time a lowercase character is encountered and increment the second count variable each time a uppercase character is encountered.
To check whether a character is in Uppercase or not in Java, use the Character. isUpperCase() method.
You can use regex
to extract the words, and then count:
(s.str.extractall(r'(\b[A-Z]{2,}\b)') # extract all capitalized words with len at least 2
.groupby(level=0).size() # count by each index
.reindex(s.index, fill_value=0) # fill the missing with 0
)
Output:
0 2
1 0
2 1
dtype: int64
Borrow Quang's regex
s.str.count(r'(\b[A-Z]{2,}\b)')
0 2
1 0
2 1
dtype: int64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With