Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ideas for converting straight quotes to curly quotes

I have a file that contains "straight" (normal, ASCII) quotes, and I'm trying to convert them to real quotation mark glyphs (“curly” quotes, U+2018 to U+201D). Since the transformation from two different quote characters into a single one has been lossy in the first place, obviously there is no way to automatically perform this conversion; nevertheless I suspect a few heuristics will cover most cases. So the plan is a script (in Emacs) that does something like the following: for each straight quote character,

  1. guess which curly quote character to use, if possible
  2. ask the user (me) to confirm, or make a choice

This question is about the first step: what would be a good algorithm (a set of heuristics, more like) to use, for normal English text (a novel, for example)? Here are some preliminary ideas, which I believe work for double-quotes (counterexamples are welcome!):

  1. If a double-quote is at the beginning of a line, guess that it is an opening quote.
  2. If a double-quote is at the end of a line, guess a closing quote.
  3. If a double-quote is preceded by a space, guess an opening quote.
  4. If a double-quote is followed by a space, guess a closing quote.
  5. If a double-quote doesn't fit into one of the above categories, guess that it is the “opposite” of the most recently used kind of double-quote.

Single quotes are trickier, because a ' might be either an opening quote, closing quote, or apostrophe, and we want to leave apostrophes alone (mustn't write “mustn’t”). Some of the same rules as above apply, but 'tis possible apostrophes are at the beginning of words (or lines), although it's less common than 'twas in the past. I can't offhand think of rules that would properly handle fragments like ["I like 'That '70s show'", she said]. It might require looking at more than just neighbouring characters, and compute distances between quotes, for example…

Any more ideas? It is okay if not all possible cases are covered; the goal is to be as intelligent as possible but no further. :-)

Edit: Some more things that might be worth thinking about (or might be irrelevant, not sure):

  • quotes might not always be in matching pairs: For single quotes it's obvious why as above. But even for double quotes, when there is a quotation that extends for more than one paragraph, usual typographic convention (don't ask me why) is to start each paragraph with a quotation mark, even though it has not been closed in the previous one. So simply keeping a state machine that alternates between two states will not work!
  • Nested quotation (alluded to in the "I like 'That '70s show'" example above): this might make either kind of quote not be preceded or followed by a space.
  • British/American punctuation style: are commas inside the quotes or outside?
  • Many word processors (e.g Microsoft Word) already do some sort of conversion like this. Although they are not perfect and can often be annoying, it might be instructive to learn how they work...
like image 1000
ShreevatsaR Avatar asked Feb 04 '09 00:02

ShreevatsaR


People also ask

How do you make a quote Curly?

For the curly single opening and closing quote mark (or apostrophe), use ‘ and ’ respectively. For the curly opening and closing double quotation marks, use “ and ” respectively.

How do you change quotation marks to curly?

You can simply insert the ALT code to insert the curly quotations. Long press the ALT key and then type the number. Using this code guide, you can easily insert the desired smart quotations.

How do you change straight quotes to curly in Excel?

In a document that already has straight quotes (I assume that is where you are seeing the problem), do a find and replace: Find: single or double quote (' or " - use the actual character) and Replace: same type of quote as in Find (' or "). This will replace the straight quotes with curly (smart) quotes.


2 Answers

A good place to start would be with a state machine:

  • Starting at position 0, iterate over the characters
  • Upon finding a quote, enter the "Quoted" state ( open quote )
  • If in "Quoted" state and you encounter a quote, return to "Starting" state ( closing quote )

You can make additional decisions at each of the state transitions.

You could attempt to normalize the single quotes by identifying known conjunctions, for instance, and converting them to a different, not text, character prior to processing.

My $0.02

like image 106
Ryan Emerle Avatar answered Sep 22 '22 03:09

Ryan Emerle


guess which curly quote character to use, if possible

It is not, in the general case.

The simple algorithm that most automatic converters use is just to look at the previous letter you typed before the ' or ". If it's a space, start of line, opening bracket or other opening quote, choose opening quote, else closing. The advantage of this method is that it can run as-you-type, so when it chooses the wrong one you can generally correct it.

we want to leave apostrophes alone

I agree! But not many people do. It's normal typesetting practice to turn an apostrophe into a left-facing single quote. Personally I prefer to leave them as they are, to distinguish them from enclosing quotes, making the text easier (I find) to read, and possible to process automatically.

However this really is just my taste and is not generally considered justified merely because the character is defined by the Unicode standard as being APOSTROPHE.

'tis possible apostrophes are at the beginning of words

Indeed. There is no way to tell an apostrophe from a potential open quote in cases like the classic Fish 'n' Chips, short of enormous amounts of cultural context.

(Not to mention primes, okinas, glottal stops and various other uses of the apostrophe...)

The best thing to do, of course, is install a keyboard layout that can type smart quotes directly. I have ‘’ on AltGr+[], “” on AltGr+Shift+[], –— on AltGr+[Shift]+dash, and so on.

like image 44
bobince Avatar answered Sep 24 '22 03:09

bobince