I'm looking for a SimpleGrepSedPerlOrPythonOneLiner that outputs all quotations in a text.
Example 1:
echo “HAL,” noted Frank, “said that everything was going extremely well.” | SimpleGrepSedPerlOrPythonOneLiner
stdout:
"HAL,"
"said that everything was going extremely well.”
Example 2:
cat MicrosoftWindowsXPEula.txt | SimpleGrepSedPerlOrPythonOneLiner
stdout:
"EULA"
"Software"
"Workstation Computer"
"Device"
"DRM"
etc.
(link to the corresponding text).
To remove double quotes just from the beginning and end of the String, we can use a more specific regular expression: String result = input. replaceAll("^\"|\"$", ""); After executing this example, occurrences of double quotes at the beginning or at end of the String will be replaced by empty strings.
To extract strings in between the quotations we can use findall() method from re library.
I like this:
perl -ne 'print "$_\n" foreach /"((?>[^"\\]|\\+[^"]|\\(?:\\\\)*")*)"/g;'
It's a little verbose, but it handles escaped quotes and backtracking a lot better than the simplest implementation. What it's saying is:
my $re = qr{
" # Begin it with literal quote
(
(?> # prevent backtracking once the alternation has been
# satisfied. It either agrees or it does not. This expression
# only needs one direction, or we fail out of the branch
[^"\\] # a character that is not a dquote or a backslash
| \\+ # OR if a backslash, then any number of backslashes followed by
[^"] # something that is not a quote
| \\ # OR again a backslash
(?>\\\\)* # followed by any number of *pairs* of backslashes (as units)
" # and a quote
)* # any number of *set* qualifying phrases
) # all batched up together
" # Ended by a literal quote
}x;
If you don't need that much power--say it's only likely to be dialog and not structured quotes, then
/"([^"]*)"/
probably works about as well as anything else.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With