Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I extract a tag's attribute value from an HTML file?

Tags:

regex

bash

I know, don't parse using curl, grep and sed. But I am looking for an easy approach, not a very safe one.

So I get an HTML file with curl, from which I need a value of a certain attribute from a tag. I use grep to get me the line where it says token. This only occurs once. This gives me a whole div:

<div class="userlinks">
  <span class="arrow flleft profilesettings">settings</span>
  <form class="logoutform" method="post" action="/logout">
    <input class="logoutbtn arrow flright" type="submit" value="Log out">
    <input type="hidden" name="ltoken" value="a5fc8828a42277538f1352cf9ea27a71">
  </form>
</div>

How can I get just the value attribute (e.g. "a5fc8828a42277538f1352cf9ea27a71")?

like image 553
tzippy Avatar asked Jul 17 '12 13:07

tzippy


People also ask

Where can you find an HTML attribute?

Attributes are always specified in the start tag (or opening tag) and usually consists of name/value pairs like name="value" . Attribute values should always be enclosed in quotation marks.

What is attribute value in HTML?

The value attribute specifies the value of an <input> element. The value attribute is used differently for different input types: For "button", "reset", and "submit" - it defines the text on the button. For "text", "password", and "hidden" - it defines the initial (default) value of the input field.


2 Answers

There's no need to grep:

sed -n '/token/s/.*name="ltoken"\s\+value="\([^"]\+\).*/\1/p' input_file
like image 73
perreal Avatar answered Nov 08 '22 05:11

perreal


One way, using sed:

sed "s/.* value=\"\(.*\)\".*/\1/" file.txt

Results:

a5fc8828a42277538f1352cf9ea27a71

HTH

like image 43
Steve Avatar answered Nov 08 '22 03:11

Steve