I have a string and rules/mappings for replacement and no-replacements.
E.g.
"This is an example sentence that needs to be processed into a new sentence."
"This is a second example sentence that shows how 'sentence' in 'sentencepiece' should not be replaced."
Replacement rules:
replace_dictionary = {'sentence': 'processed_sentence'}
no_replace_set = {'example sentence'}
Result:
"This is an example sentence that needs to be processed into a new processed_sentence."
"This is a second example sentence that shows how 'processed_sentence' in 'sentencepiece' should not be replaced."
Additional criteria:
I was thinking what would the cleanest way to solve this problem in Python 3.x be?
Based on the answer of demongolem.
UPDATE
I am sorry, I missed the fact, that only whole words should be replaced. I updated my code and even generalized it for usage in a function.
def replace_whole(sentence, replace_token, replace_with, dont_replace):
rx = f"[\"\'\.,:; ]({replace_token})[\"\'\.,:; ]"
iter = re.finditer(rx, sentence)
out_sentence = ""
found = []
indices = []
for m in iter:
indices.append(m.start(0))
found.append(m.group())
context_size=len(dont_replace)
for i in range(len(indices)):
context = sentence[indices[i]-context_size:indices[i]+context_size]
if dont_replace in context:
continue
else:
# First replace the word only in the substring found
to_replace = found[i].replace(replace_token, replace_with)
# Then replace the word in the context found, so any special token like "" or . gets taken over and the context does not change
replace_val = context.replace(found[i], to_replace)
# finally replace the context found with the replacing context
out_sentence = sentence.replace(context, replace_val)
return out_sentence
Use regular expressions for finding all occurences and values of your string (as we need to check whether is a whole word or embedded in any kind of word), by using finditer(). You might need to adjust the rx to what your definition of "whole word" is. Then get the context around these values of the size of your no_replace rule. Then check, whether the context contains your no_replace string.
If not, you may replace it, by using replace() for the word only, then replace the occurence of the word in the context, then replace the context in the whole text. That way the replacing process is nearly unique and no weird behaviour should happen.
Using your examples, this leads to:
replace_whole(sen2, "sentence", "processed_sentence", "example sentence")
>>>"This is a second example sentence that shows how 'processed_sentence' in 'sentencepiece' should not be replaced."
and
replace_whole(sen1, "sentence", "processed_sentence", "example sentence")
>>>'This is an example sentence that needs to be processed into a new processed_sentence.'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With