I want to replace all single quotes in the string with double with the exception of occurrences such as "n't", "'ll", "'m" etc.
input="the stackoverflow don\'t said, \'hey what\'"
output="the stackoverflow don\'t said, \"hey what\""
Code 1:(@https://stackoverflow.com/users/918959/antti-haapala)
def convert_regex(text):
return re.sub(r"(?<!\w)'(?!\w)|(?<!\w)'(?=\w)|(?<=\w)'(?!\w)", '"', text)
There are 3 cases: ' is NOT preceded and is NOT followed by a alphanumeric character; or is not preceded, but followed by an alphanumeric character; or is preceded and not followed by an alphanumeric character.
Issue: That doesn't work on words that end in an apostrophe, i.e. most possessive plurals, and it also doesn't work on informal abbreviations that start with an apostrophe.
Code 2:(@https://stackoverflow.com/users/953482/kevin)
def convert_text_func(s):
c = "_" #placeholder character. Must NOT appear in the string.
assert c not in s
protected = {word: word.replace("'", c) for word in ["don't", "it'll", "I'm"]}
for k,v in protected.iteritems():
s = s.replace(k,v)
s = s.replace("'", '"')
for k,v in protected.iteritems():
s = s.replace(v,k)
return s
Too large set of words to specify, as how can one specify persons' etc. Please help.
Edit 1: I am using @anubhava's brillant answer. I am facing this issue. Sometimes, there language translations which the approach fail. Code=
text=re.sub(r"(?<!s)'(?!(?:t|ll|e?m|s|d|ve|re|clock)\b)", '"', text)
Problem:
In text, 'Kumbh melas' melas is a Hindi to English translation not plural possessive nouns.
Input="Similar to the 'Kumbh melas', celebrated by the banks of the holy rivers of India,"
Output=Similar to the "Kumbh melas', celebrated by the banks of the holy rivers of India,
Expected Output=Similar to the "Kumbh melas", celebrated by the banks of the holy rivers of India,
I am looking maybe to add a condition that somehow fixes it. Human-level intervention is the last option.
Edit 2: Naive and long approach to fix:
def replace_translations(text):
d = enchant.Dict("en_US")
words=tokenize_words(text)
punctuations=[x for x in string.punctuation]
for i,word in enumerate(words):
print i,word
if(i!=len(words) and word not in punctuations and d.check(word)==False and words[i+1]=="'"):
text=text.replace(words[i]+words[i+1],words[i]+"\"")
return text
Are there any corner cases I am missing or are there any better approaches?
Use the String. replace() method to replace double with single quotes, e.g. const replaced = str. replace(/"/g, "'"); . The replace method will return a new string where all occurrences of double quotes are replaced with single quotes.
Method 1 : Using the replace() method To replace a single quote from the string you will pass the two parameters. The first is the string you want to replace and the other is the string you want to place. In our case it is string. replace(” ' “,” “).
The short answer is that it depends on the country that you are writing in. In British and Australian English, one typically uses single quotes. If you're writing in North America, double quote marks are typically used.
You can also use this regex:
(?:(?<!\w)'((?:.|\n)+?'?)'(?!\w))
DEMO IN REGEX101
This regex match whole sentence/word with both quoting marks, from beginning and end, but also campure the content of quotation inside group nr 1, so you can replace matched part with "\1"
.
(?<!\w)
- negative lookbehind for non-word character, to exclude words like: "you'll", etc., but to allow the regex to match quatations after characters like \n
,:
,;
,.
or -
,etc. The assumption that there will always be a whitespace before quotation is risky.'
- single quoting mark,(?:.|\n)+?'?)
- non capturing group: one or more of any character or
new line (to match multiline sentences) with lazy quantifire (to avoid
matching from first to last single quoting mark), followed by
optional single quoting sing, if there would be two in a row'(?!\w)
- single quotes, followed by non-word character, to exclude
text like "i'm", "you're" etc. where quoting mark is beetwen words,However it still has problem with matching sentences with apostrophes occurs after word ending with s, like: 'the classes' hours'
. I think it is impossible to distinguish with regex when s
followed by '
should be treated as end of quotation, or as or s
with apostrophes. But I figured out a kind of limited work around for this problem, with regex:
(?:(?<!\w)'((?:.|\n)+?'?)(?:(?<!s)'(?!\w)|(?<=s)'(?!([^']|\w'\w)+'(?!\w))))
DEMO IN REGEX101
PYTHON IMPLEMENTATION
with additional alternative for cases with s'
: (?<!s)'(?!\w)|(?<=s)'(?!([^']|\w'\w)+'(?!\w)
where:
(?<!s)'(?!\w)
- if there is no s
before '
, match as regex above (first attempt),(?<=s)'(?!([^']|\w'\w)+'(?!\w)
- if there is s
before '
, end a match on this '
only if there is no other '
followed by non-word
character in following text, before end or before another '
(but only '
preceded by letter other than s
, or opening of next quotaion). The \w'\w
is to include in such match a '
wich are between letters, like in i'm
, etc. this regex should match wrong only it there is couple s'
cases in a row. Still, it is far from perfect solution.
Also, using \w
there is always chance that '
would occur after sybol or non-[a-zA-Z_0-9]
but still letter character, like some local language character, and then it will be treated as beginning of a quatation. It could be avoided by replacing (?<!\w)
and (?!\w)
with (?<!\p{L})
and (?!\p{L})
or something like (?<=^|[,.?!)\s])
, etc., positive lookaround for characters wich can occour in sentence before quatation. However a list could be quite long.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With