How can I extract all quotations in a text?

I'm looking for a SimpleGrepSedPerlOrPythonOneLiner that outputs all quotations in a text.

Example 1:

echo HAL,” noted Frank, said that everything was going extremely well.” | SimpleGrepSedPerlOrPythonOneLiner


"said that everything was going extremely well.”

Example 2:

cat MicrosoftWindowsXPEula.txt | SimpleGrepSedPerlOrPythonOneLiner


"Workstation Computer"


(link to the corresponding text).

1 Answers

I like this:

perl -ne 'print "$_\n" foreach /"((?>[^"\\]|\\+[^"]|\\(?:\\\\)*")*)"/g;'

It's a little verbose, but it handles escaped quotes and backtracking a lot better than the simplest implementation. What it's saying is:

my $re = qr{
   "               # Begin it with literal quote
     (?>           # prevent backtracking once the alternation has been
                   # satisfied. It either agrees or it does not. This expression
                   # only needs one direction, or we fail out of the branch

         [^"\\]    # a character that is not a dquote or a backslash
     |   \\+       # OR if a backslash, then any number of backslashes followed by 
         [^"]      # something that is not a quote
     |   \\        # OR again a backslash
         (?>\\\\)* # followed by any number of *pairs* of backslashes (as units)
         "         # and a quote
     )*            # any number of *set* qualifying phrases
  )                # all batched up together
  "                # Ended by a literal quote

If you don't need that much power--say it's only likely to be dialog and not structured quotes, then


probably works about as well as anything else.

