Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to extract string following a pattern with grep, regex or perl [duplicate]

I have a file that looks something like this:

    <table name="content_analyzer" primary-key="id">       <type="global" />     </table>     <table name="content_analyzer2" primary-key="id">       <type="global" />     </table>     <table name="content_analyzer_items" primary-key="id">       <type="global" />     </table> 

I need to extract anything within the quotes that follow name=, i.e., content_analyzer, content_analyzer2 and content_analyzer_items.

I am doing this on a Linux box, so a solution using sed, perl, grep or bash is fine.

like image 610
wrangler Avatar asked Feb 22 '11 16:02

wrangler


People also ask

Can you use regex with grep?

GNU grep supports three regular expression syntaxes, Basic, Extended, and Perl-compatible. In its simplest form, when no regular expression type is given, grep interpret search patterns as basic regular expressions. To interpret the pattern as an extended regular expression, use the -E ( or --extended-regexp ) option.

How do we use grep to search for a pattern in multiple files?

To search multiple files with the grep command, insert the filenames you want to search, separated with a space character. The terminal prints the name of every file that contains the matching lines, and the actual lines that include the required string of characters. You can append as many filenames as needed.

Which command is used for matching a specific pattern or expression within a file?

The grep (Global Regular Expression Print) is a unix command utility that can be used to find specific patterns described in “regular expressions”, a notation which we will learn shortly. For example, the “grep” command can be used to match all lines containing a specific pattern.


1 Answers

Since you need to match content without including it in the result (must match name=" but it's not part of the desired result) some form of zero-width matching or group capturing is required. This can be done easily with the following tools:

Perl

With Perl you could use the n option to loop line by line and print the content of a capturing group if it matches:

perl -ne 'print "$1\n" if /name="(.*?)"/' filename 

GNU grep

If you have an improved version of grep, such as GNU grep, you may have the -P option available. This option will enable Perl-like regex, allowing you to use \K which is a shorthand lookbehind. It will reset the match position, so anything before it is zero-width.

grep -Po 'name="\K.*?(?=")' filename 

The o option makes grep print only the matched text, instead of the whole line.

Vim - Text Editor

Another way is to use a text editor directly. With Vim, one of the various ways of accomplishing this would be to delete lines without name= and then extract the content from the resulting lines:

:v/.*name="\v([^"]+).*/d|%s//\1 

Standard grep

If you don't have access to these tools, for some reason, something similar could be achieved with standard grep. However, without the look around it will require some cleanup later:

grep -o 'name="[^"]*"' filename 

A note about saving results

In all of the commands above the results will be sent to stdout. It's important to remember that you can always save them by piping it to a file by appending:

> result 

to the end of the command.

like image 143
sidyll Avatar answered Sep 24 '22 06:09

sidyll