Scrapy - Extract items from table

Tags:

2 Answers

You can use CSS Selectors instead of xPaths, I always find CSS Selectors easy.

def parse_products(self, response):

    for table in response.css("#Y1 table")[1:]:
       item = Schooldates1Item()
       item['hol'] = product.css('td:nth-child(1)::text').extract_first()
       item['first'] = product.css('td:nth-child(2)::text').extract_first()
       item['last'] = product.css('td:nth-child(3)::text').extract_first()
       yield item

Also do not use tbody tag in selectors. Source:

Firefox, in particular, is known for adding elements to tables. Scrapy, on the other hand, does not modify the original page HTML, so you won’t be able to extract any data if you use in your XPath expressions.

152

answered Nov 15 '22 22:11

You need to slightly correct your code. Since you already select all elements within the table you don't need to point again to a table. Thus you can shorten your xpath to something like thistd[1]//text().

def parse_products(self, response):
    products = response.xpath('//*[@id="Year1"]/table//tr')
    # ignore the table header row
    for product in products[1:]  
       item = Schooldates1Item()
       item['hol'] = product.xpath('td[1]//text()').extract_first()
       item['first'] = product.xpath('td[2]//text()').extract_first()
       item['last'] = product.xpath('td[3]//text()').extract_first()
       yield item

Edited my answer since @stutray provide the link to a site.

answered Nov 15 '22 23:11

vold

Related questions
                            
                                How do you select child-or-self (children + self)
                            
                                Xpath: Select node but not specific child elements
                            
                                XSL if test display content when it has values
                            
                                Turn off xml header output in Saxon
                            
                                XSLT if expression syntax, combining more than one expression
                            
                                How to return text from several elements in one string using XPath?
                            
                                XSLT Sorting - how to sort xml childnodes inside a parent node with an attribute
                            
                                XPath select all text content for a <div> except for a specific tag <h5>
                            
                                Getting meta tags from a page source using Selenium Python
                            
                                Extracting content in :after using XPath
                            
                                using XPath how to select elements with absent attribute
                            
                                How to select xml element based on its attribute value start with "Heading" in xslt?
                            
                                Why Cant I Click an Element in Selenium?
                            
                                XPath in SimpleXML for default namespaces without needing prefixes
                            
                                Can I use 'and' operator in xsl for-each?
                            
                                xpath - using contains with a wildcard
                            
                                Count the number of elements are matching with for the given xpath expression
                            
                                PostgreSQL Xpath to select element and it's child attribute as two columns
                            
                                Working with the SQL Server XML data type
                            
                                Using XSLT Apply-Templates to conditionally select nodes

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Scrapy - Extract items from table

Tags:

xpath

scrapy

stonk

People also ask

2 Answers

Umair Ayub

vold

Recent Activity

Donate For Us