How to get the source of html in lxml?

Tags:

python

lxml

import urllib
import lxml.html
down='http://blog.sina.com.cn/s/blog_71f3890901017hof.html'
file=urllib.urlopen(down).read()
root=lxml.html.document_fromstring(file)
body=root.xpath('//div[@class="articalContent  "]')[0]
print body.text_content()

When i run the code, what i get is the text content ,how can i get the html source code of it,not the text content?

881

asked Dec 31 '12 06:12

Bqsj Sjbq

1 Answers

Use

html = lxml.html.tostring(node)

and please: read the basic documentation of the tools you are using first.

157

answered Sep 20 '22 16:09

Andreas Jung

Related questions
                            
                                A way to automatically pass parameters to a function?
                            
                                Python for-loop to list comprehension
                            
                                Fix Python Unicode Error caused by another language
                            
                                Django: How to raise an exception when a user submits an unfinished form?
                            
                                Keeping try block small when catching exceptions in generator
                            
                                Python: MySQL: Handling timeouts
                            
                                Simplest way to return an array that is nested in multiple arrays
                            
                                Non-python programs in a virtualenv
                            
                                Django M2M QuerySet filtering on multiple foreign keys
                            
                                wxpython: automatically resize a static image (staticbitmap) to fit into size
                            
                                Difficulty running regetron in cmd
                            
                                What needs to go in the %files section of an RPM
                            
                                PIL cuts off top of letters
                            
                                Convert datetime object in a particular timezone to epoch seconds in that timezone
                            
                                subclassing celery Task
                            
                                Logging from an External Application
                            
                                Django - WebService with soaplib xml characters or ampersand scaping?
                            
                                Django settings.py not being detected
                            
                                numpy duplicate elements
                            
                                Opening and searching dBase III (DBF) databases in Python