i have a small problem with punctuations.
My assignment was to check if there were any duplicated words in a text, if there were any duplicated words in the list my job was to highlight them by using .upper().
Example on text: I like apples, apples is the best thing i know.
So i took the original text, striped it from punctuations, transformed all words to lowercase and then split the list. With a for-loop i compared every word in the list with each other and i found all duplicated word, all of this were placed in a new list.
Example (after using the for-loop): i like apples APPLES is the best thing I know
So the new list is now similar to the original list but with one major exception, it is lacking the punctuations.
Is there a way to add the punctuations on the new list were they are "suppose to be" (from the old lists position)? Is there some kind of method build in python that can do this, or do i have to compare the two lists with another for-loop and then add the punctuations to the new list?
NewList = [] # Creates an empty list
for word in text:
if word not in NewList:
NewList.append(word)
elif word in NewList: #
NewList.append(word.upper())
List2 = ' '.join(NewList)
the code above works for longer text and thats the code i have been using for Highlighting duplicated words. The only problem is that the punctations doesn't exist in the new file, thats the only problem i have.
Here's an example of using sub method with callback from build-in regexp module.
This solution respects all the punctuation.
import re
txt = "I like,, ,apples, apples! is the .best. thing *I* know!!1"
def repl(match, stack):
word = match.group(0)
word_upper = word.upper()
if word_upper in stack:
return word_upper
stack.add(word_upper)
return word
def highlight(s):
stack = set()
return re.sub('\\b([a-zA-Z]+)\\b', lambda match: repl(match, stack), s)
print txt
print highlight(txt)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With