Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get all text inside a tag in lxml

I'd like to write a code snippet that would grab all of the text inside the <content> tag, in lxml, in all three instances below, including the code tags. I've tried tostring(getchildren()) but that would miss the text in between the tags. I didn't have very much luck searching the API for a relevant function. Could you help me out?

<!--1--> <content> <div>Text inside tag</div> </content> #should return "<div>Text inside tag</div>  <!--2--> <content> Text with no tag </content> #should return "Text with no tag"   <!--3--> <content> Text outside tag <div>Text inside tag</div> </content> #should return "Text outside tag <div>Text inside tag</div>" 
like image 241
Kevin Burke Avatar asked Jan 07 '11 09:01

Kevin Burke


People also ask

Can lxml parse HTML?

lxml provides a very simple and powerful API for parsing XML and HTML. It supports one-step parsing as well as step-by-step parsing using an event-driven API (currently only for XML).

What is Xpath in lxml?

lxml. etree supports the simple path syntax of the find, findall and findtext methods on ElementTree and Element, as known from the original ElementTree library (ElementPath).

Is XML and lxml are same?

lxml is a Python library which allows for easy handling of XML and HTML files, and can also be used for web scraping. There are a lot of off-the-shelf XML parsers out there, but for better results, developers sometimes prefer to write their own XML and HTML parsers.


1 Answers

Does text_content() do what you need?

like image 72
Ed Summers Avatar answered Oct 09 '22 18:10

Ed Summers