I would like to apply the function .lower() to a string for all of the words that are predefined in a list, but not for any other words.
For instance, take the string provided below.
string1 = "ThE QuIcK BroWn foX jUmpEd oVer thE LaZY dOg."
Now say I have a list as seen below:
lower_list = ['quick', 'jumped', 'dog']
My ideal output would be for the function to apply the .lower() to the entire string like this:
string1.lower()
And then the output have the function only apply the .lower() to the instances in string1 that are in the list lower_list as appears below:
> ThE quick BroWn foX jumped oVer thE LaZY dog.
Can this be done in a simple manner? My idea was to use a for loop, but I need to retain the formatting of the string for example say a string has multiple lines and indents on some lines and not others.
EDIT: I am getting the following error
parts[1::2] = (word.lower() for word in parts[1::2])
AttributeError: 'NoneType' object has no attribute 'lower'
I believe this might be due to have characters other than letters in the strings i use in lower_list. If I were to have a string like this '(copy)' then I get the above error. Is there a way to get around this? I was thinking of making every split part into a string using str(xxx) but not sure how to do that...
For this kind of problem you should be careful about cases like this one:
>>> phrase = 'the apothecary'
>>> phrase.replace('the', 'THE')
'THE apoTHEcary'
That is, you only want to do the replacements for whole word matches, but it is quite difficult to only match whole words by direct string manipulations, because the boundary of a word can be at a space ' ' character, but it could also be at a full stop '.' or at the start or end of the input string.
Fortunately, regexes make it easy to match whole words, because \b in a regex matches any word boundary. So we can solve the problem this way:
lower_list, case-insensitive, but only when they have a word boundary before and after them.Because we're splitting on words rather than spaces, this means the original whitespace is preserved exactly. Here's an implementation:
import re
def lowercase_words(string, words):
regex = r'\b(' + '|'.join(words) + r')\b'
parts = re.split(regex, string, flags=re.IGNORECASE)
parts[1::2] = (word.lower() for word in parts[1::2])
return ''.join(parts)
Example:
>>> lowercase_words(string1, lower_list)
'ThE quick BroWn foX jumped oVer thE LaZY dog.'
>>> lowercase_words('ThE aPoThEcArY', ['the'])
'the aPoThEcArY'
>>> lowercase_words(' HELLO \n WORLD ', ['hello', 'world'])
' hello \n world '
The above assumes that the words in lower_list only contain letters. If they might contain other characters, then there are two more problems:
re.escape.\b if the word starts and/or ends with a letter.The following makes it work:
import re
def lowercase_words(string, words):
def make_regex_part(word):
word = re.escape(word)
if word[:1].isalpha(): word = r'\b' + word
if word[-1:].isalpha(): word += r'\b'
return word
regex = '(' + '|'.join(map(make_regex_part, words)) + ')'
parts = re.split(regex, string, flags=re.IGNORECASE)
parts[1::2] = (word.lower() for word in parts[1::2])
return ''.join(parts)
Example:
>>> lowercase_words('(TrY) iT nOw WiTh bRaCkEtS', ['(try)', 'it'])
'(try) it nOw WiTh bRaCkEtS'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With