Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace apostrophe/short words in python

I am using python to clean a given sentence. Suppose that my sentence is:

What's the best way to ensure this?

I want to convert:

What's -> What is

Similarly,

 must've -> must have

Also, verbs to original form,

told -> tell

Singular to plural, and so on.

I am currently exploring textblob. But not all of the above is possible using it.

like image 808
learner Avatar asked Mar 25 '17 15:03

learner


People also ask

How do you remove the apostrophe from text in Python?

replace() method to remove all apostrophes from a string, e.g. result = my_str. replace("'", '') . The str. replace() method will remove all apostrophes from the string by replacing them with empty strings.

How does Python deal with apostrophes?

I can see 2 easy ways out: 1) Enclose the string in double quotes, so you can use apostrophes inside. example: "BRIAN'S MOTHER". 2)Use the "\" escape character to escape the apostrophe. example: "BRIAN\'S MOTHER".

How do I change my contractions?

Avoid using contractions in formal writing. A contraction is a combination of two words as one, such as "don't," "can't," and "isn't." The use of contractions is inappropriate in formal legal writing. Replace them with the two-word version of the contraction.


1 Answers

The answers above will work perfectly well and could be better for ambiguous contraction (although I would argue that there aren't that much of ambiguous cases). I would use something that is more readable and easier to maintain:

import re

def decontracted(phrase):
    # specific
    phrase = re.sub(r"won\'t", "will not", phrase)
    phrase = re.sub(r"can\'t", "can not", phrase)

    # general
    phrase = re.sub(r"n\'t", " not", phrase)
    phrase = re.sub(r"\'re", " are", phrase)
    phrase = re.sub(r"\'s", " is", phrase)
    phrase = re.sub(r"\'d", " would", phrase)
    phrase = re.sub(r"\'ll", " will", phrase)
    phrase = re.sub(r"\'t", " not", phrase)
    phrase = re.sub(r"\'ve", " have", phrase)
    phrase = re.sub(r"\'m", " am", phrase)
    return phrase


test = "Hey I'm Yann, how're you and how's it going ? That's interesting: I'd love to hear more about it."
print(decontracted(test))
# Hey I am Yann, how are you and how is it going ? That is interesting: I would love to hear more about it.

It might have some flaws I didn't think about though.

like image 188
Yann Dubois Avatar answered Oct 05 '22 02:10

Yann Dubois