I am trying to scrape the text only from body using python Scrapy, but haven't had any luck yet. Wishing some scholars might be able to help me here scraping all the text from the <code><body></code> tag.

Scrapy uses XPath notation to extract parts of a HTML document. So, have you tried just using the <code>/html/body</code> path to extract <code><body></code>? (assuming it's nested in <code><html></code>). It might be even simpler to use the <code>//body</code> selector: <pre class="prettyprint"><code>x.select("//body").extract() # extract body </code></pre> You can find more information about the selectors Scrapy provides here.

Scrapy Body Text Only

1 Answers

Scrapy uses XPath notation to extract parts of a HTML document. So, have you tried just using the /html/body path to extract <body>? (assuming it's nested in <html>). It might be even simpler to use the //body selector:

x.select("//body").extract()    # extract body

You can find more information about the selectors Scrapy provides here.

131

answered Oct 06 '22 23:10

Eli Bendersky

Related questions
                            
                                How to make these dynamically typed functions type-safe? [closed]
                            
                                Library to read a MySQL dump? [closed]
                            
                                License plate recognition using OpenCV
                            
                                Integer division compared to floored quotient: why this surprising result?
                            
                                setting help_text for each choice in a RadioSelect
                            
                                Is_prime function via regex in python (from perl)
                            
                                Problem using easy_install on Windows 7, 64 bit. (cannot find python.exe)
                            
                                Python, OpenOffice: Programmatically Manipulating spreadsheets
                            
                                Doxygen C++ comment string parser in python?
                            
                                Python Job Service Daemon?
                            
                                How to pickle and unpickle objects with self-references and from a class with slots?
                            
                                Python: Metaclasses all the way down
                            
                                Annotate a django query via a reverse relationship
                            
                                How to set the width and heigth of the ouput image in Pygraphviz
                            
                                Implementing parser for markdown-like language
                            
                                Simple queue for youtube-dl in the Linux shell
                            
                                Creating a custom file like object python suggestions?
                            
                                Combine inserts into one transaction Python SQLite3
                            
                                Selenium in Python
                            
                                Using numpy.bincount with array weights

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Scrapy Body Text Only

Tags:

python

scrapy

scraper

scrape

mmrs151

People also ask

1 Answers

Eli Bendersky

Recent Activity

Donate For Us