Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting text from inside an HTML tag within a local file with grep [duplicate]

Possible Duplicate:
RegEx match open tags except XHTML self-contained tags

Excerpt From Input File

<TD class="clsTDLabelWeb" width="28%">Municipality:&nbsp;</TD>
<TD style="WIDTH: 394px" class="clsTDLabelSm" colSpan="5">
<span id="DInfo1_Municipality">JUPITER</span></TD>

My Regular Expression

(?<=<span id="DInfo1_Municipality">)([^</span>]*)

I have an HTML file saved to disk. I would like to use grep to search through the file and output the contents of a specific span, though I don't know if this is a proper use of grep. When I run grep on the file with the expression read from another file (so I dont mess up escaping any special characters), it doesn't output anything. I have tested the expression in RegExr and it matches "JUPITER" which is exactly what I want returned. Thank you so much for your help!

Desired Output

JUPITER
like image 808
LakeMicrobe Avatar asked Aug 29 '10 01:08

LakeMicrobe


People also ask

How will you get all the matching tags in a HTML file?

If you want to find all HTML elements that match a specified CSS selector (id, class names, types, attributes, values of attributes, etc), use the querySelectorAll() method.

How do I grep a text file in Linux?

The grep command searches through the file, looking for matches to the pattern specified. To use it type grep , then the pattern we're searching for and finally the name of the file (or files) we're searching in. The output is the three lines in the file that contain the letters 'not'.

How do I redirect the output of a grep command?

If you want to "clean" the results you can filter them using pipe | for example: grep -n "test" * | grep -v "mytest" > output-file will match all the lines that have the string "test" except the lines that match the string "mytest" (that's the switch -v ) - and will redirect the result to an output file.

How do I print grep output?

The grep command prints entire lines when it finds a match in a file. To print only those lines that completely match the search string, add the -x option. The output shows only the lines with the exact match.


1 Answers

Give this a try:

sed -n 's|^<span id="DInfo1_Municipality">\([^<]*\)</span></TD>$|\1|p' file

or with GNU grep and your regex:

grep -Po '(?<=<span id="DInfo1_Municipality">)([^</span>]*)'
like image 176
Dennis Williamson Avatar answered Nov 15 '22 06:11

Dennis Williamson