Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Xidel extract data inside the tag -- raw output

Pleased to be member of StackOverflow, a long time lurker in here.

I need to parse text between two tags, so far I've found a wonderful tool called Xidel

I need to parse text in between

<div class="description">
Text. <tag>Also tags.</tag> More text.
</div>

However, said text can include HTML tags in it, and I want them to be printed out in raw format. So using a command like:

xidel --xquery '//div[@class="description"]' file.html

Gets me:

Text. Also tags. More text.

And I need it to be exactly as it is, so:

Text. <tag>Also tags.</tag> More text.

How can I achieve this?

Regards, R

like image 297
RomanM Avatar asked Oct 12 '25 07:10

RomanM


2 Answers

Can be done in a couple of ways with Xidel, which is why I love it so much.

HTML-templating:

xidel -s file.html -e "<div class='description'>{inner-html()}</div>"

XPath:

xidel -s file.html -e "//div[@class='description']/inner-html()"

CSS:

xidel -s file.html -e "inner-html(css('div.description'))"

BTW, on Linux: swap the double quotes for single and vice versa.

like image 95
MatrixView Avatar answered Oct 13 '25 20:10

MatrixView


You can show the tags by adding the --output-format=xml option.

xidel --xquery '//div[@class="description"]' --output-format=xml file.html 
like image 29
Cameron Hudson Avatar answered Oct 13 '25 20:10

Cameron Hudson