Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python -Remove capitalized words from long string

Tags:

python

string

I have long string (28MB) of normal sentences. I want to remove all words what are fully in capital letters (like TNT, USA, OMG).

So from sentance:

Jump over TNT in There.

I would like to get:

Jump over  in There.

Is there any way, how to do it without splitting the text into list and itereate? Is it possible to use regex somehow to do is?

like image 597
matousc Avatar asked Oct 24 '25 15:10

matousc


2 Answers

You can use the set of capital letters [A-Z] captured with word boundary \b:

import re

line = 'Jump over TNT in There NOW'

m = re.sub(r'\b[A-Z]+\b', '', line)
#'Jump over  in There '
like image 188
Moses Koledoye Avatar answered Oct 27 '25 03:10

Moses Koledoye


Use the module re,

import re

line = 'Jump over TNT in There.'
new_line = re.sub(r'[A-Z]+(?![a-z])', '', line)

print(new_line)
# Output
Jump over  in There.
like image 33
SparkAndShine Avatar answered Oct 27 '25 04:10

SparkAndShine