<p>I have html looks like this:</p> <pre class="prettyprint"><code><h1>Text 1</h1> <div>Some info</div> <h1>Text 2</h1> <div>...</div> </code></pre> <p>I understand how to extract using scrapy information from h1:</p> <pre class="prettyprint"><code>content.select("//h1[contains(text(),'Text 1')]/text()").extract() </code></pre> <p>But my goal is to extract content from <code><div>Some info</div></code></p> <p>My problem is that I don't have any specific information about div. All what I know, that it goes exactly after <code><h1>Text 1</h1></code>. Can I, using selectors, get NEXT element in tree? Element, that situated on the same level in DOM tree?</p> <p>Something like:</p> <pre class="prettyprint"><code>a = content.select("//h1[contains(text(),'Text 1')]/text()") a.next("//div/text()").extract() Some info </code></pre>

<p>Try this <code>xpath</code>:</p> <pre class="prettyprint"><code>//h1[contains(text(), 'Text 1')]/following-sibling::div[1]/text() </code></pre>

How to select next node using scrapy

Tags:

scrapy

I have html looks like this:

<h1>Text 1</h1>
<div>Some info</div>
<h1>Text 2</h1>
<div>...</div>

I understand how to extract using scrapy information from h1:

content.select("//h1[contains(text(),'Text 1')]/text()").extract()

But my goal is to extract content from <div>Some info</div>

My problem is that I don't have any specific information about div. All what I know, that it goes exactly after <h1>Text 1</h1>. Can I, using selectors, get NEXT element in tree? Element, that situated on the same level in DOM tree?

Something like:

a = content.select("//h1[contains(text(),'Text 1')]/text()")
a.next("//div/text()").extract()
Some info

764

asked Nov 04 '13 12:11

SkyFox

1 Answers

Try this xpath:

//h1[contains(text(), 'Text 1')]/following-sibling::div[1]/text()

135

answered Sep 17 '22 12:09

kev

Related questions
                            
                                matplotlib: Creating two (stacked) subplots with SHARED X axis but SEPARATE Y axis values
                            
                                tkinter and time.sleep
                            
                                logger.info(traceback.print_exc()) coming on python gui
                            
                                Why is django's settings object a LazyObject?
                            
                                SciPy instead of GNU Octave
                            
                                Set global output precision python
                            
                                User-defined exception: <unprintable ... object>
                            
                                Missing bootstrap resources in Django-Rest-Framework
                            
                                Why does PyCrypto not use the default IV?
                            
                                Django 1.4 - Redirect to Non-HTTP urls
                            
                                How to I delete all Flask sessions?
                            
                                Pass another object to the main flask application
                            
                                Loading huge XML files and dealing with MemoryError
                            
                                py.test - how to use a context manager in a funcarg/fixture
                            
                                Python regex search for string at beginning of line in file
                            
                                In-place QuickSort in Python
                            
                                How to use @pytest.mark with base classes?
                            
                                pandas: Filling missing values within a group
                            
                                Python: How do I save generator output into text file?
                            
                                vary the color of each bar in bargraph using particular value

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to select next node using scrapy

Tags:

python

html

dom

parsing

scrapy

SkyFox

People also ask

1 Answers

kev

Recent Activity

Donate For Us