How to extract string following a pattern with grep, regex or perl [duplicate]

Tags:

I have a file that looks something like this:

    <table name="content_analyzer" primary-key="id">       <type="global" />     </table>     <table name="content_analyzer2" primary-key="id">       <type="global" />     </table>     <table name="content_analyzer_items" primary-key="id">       <type="global" />     </table>

I need to extract anything within the quotes that follow name=, i.e., content_analyzer, content_analyzer2 and content_analyzer_items.

I am doing this on a Linux box, so a solution using sed, perl, grep or bash is fine.

610

asked Feb 22 '11 16:02

wrangler

1 Answers

Since you need to match content without including it in the result (must match name=" but it's not part of the desired result) some form of zero-width matching or group capturing is required. This can be done easily with the following tools:

Perl

With Perl you could use the n option to loop line by line and print the content of a capturing group if it matches:

perl -ne 'print "$1\n" if /name="(.*?)"/' filename

GNU grep

If you have an improved version of grep, such as GNU grep, you may have the -P option available. This option will enable Perl-like regex, allowing you to use \K which is a shorthand lookbehind. It will reset the match position, so anything before it is zero-width.

grep -Po 'name="\K.*?(?=")' filename

The o option makes grep print only the matched text, instead of the whole line.

Vim - Text Editor

Another way is to use a text editor directly. With Vim, one of the various ways of accomplishing this would be to delete lines without name= and then extract the content from the resulting lines:

:v/.*name="\v([^"]+).*/d|%s//\1

Standard grep

If you don't have access to these tools, for some reason, something similar could be achieved with standard grep. However, without the look around it will require some cleanup later:

grep -o 'name="[^"]*"' filename

A note about saving results

In all of the commands above the results will be sent to stdout. It's important to remember that you can always save them by piping it to a file by appending:

> result

to the end of the command.

143

answered Sep 24 '22 06:09

sidyll

Related questions
                            
                                grepping using the "|" alternative operator
                            
                                Which regular expression operator means 'Don't' match this character?
                            
                                Javascript regular expression: remove first and last slash
                            
                                Python regex - r prefix
                            
                                Get the index of a pattern in a string using regex
                            
                                Remove part of a string
                            
                                Is there a difference between /\s/g and /\s+/g?
                            
                                Validate email address in Dart? [duplicate]
                            
                                Is it possible for a computer to "learn" a regular expression by user-provided examples?
                            
                                Getting the text that follows after the regex match
                            
                                How do I do a case insensitive regular expression in Go?
                            
                                A regex for version number parsing
                            
                                Number of occurrences of a character in a string [duplicate]
                            
                                How do you validate a URL with a regular expression in Python?
                            
                                How can I recognize an evil regex?
                            
                                Java regular expression OR operator
                            
                                javascript regular expression to not match a word
                            
                                How to determine if a string is a valid v4 UUID? [duplicate]
                            
                                how to use sed, awk, or gawk to print only what is matched?
                            
                                Is gcc 4.8 or earlier buggy about regular expressions?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to extract string following a pattern with grep, regex or perl [duplicate]

Tags:

regex

html-parsing

sed

perl

text-extraction