Trouble using lambda function within my scraper

Tags:

I've written a script to parse the name and price of certain items from craigslist. The xpath I've defined within my scraper are working ones. The thing is when I try to scrape the items in usual way then applying try/except block I can avoid IndexError when the value of certain price is none. I even tried with customized function to make it work and found success as well.

However, In this below snippet I would like to apply lambda function to kick out IndexError error. I tried but could not succeed.

Btw, when I run the code It neither fetches anything nor throws any error either.

import requests
from lxml.html import fromstring

page = requests.get('http://bangalore.craigslist.co.in/search/rea?s=120').text
tree = fromstring(page)

# I wish to fix this function to make a go
get_val = lambda item,path:item.text if item.xpath(path) else ""

for item in tree.xpath('//li[@class="result-row"]'):
    link = get_val(item,'.//a[contains(@class,"hdrlnk")]')
    price = get_val(item,'.//span[@class="result-price"]')
    print(link,price)

294

asked Mar 15 '18 13:03

SIM

1 Answers

First of all, your lambda function get_val returns the text of the item if the path exists, and not the text of the searched node. This is probably not what you want. If want want to return the text content of the (first) element matching the path, you should write:

get_val = lambda item, path: item.xpath(path)[0].text if item.xpath(path) else ""

Please note that xpath returns a list. I assume here that you have only one element in that list.

The output is something like that:

...
Residential Plot @ Sarjapur Check Post ₨1000
Prestige dolce vita apartments in whitefield, Bangalore 
Brigade Golden Triangle, ₨12500000
Nikoo Homes, ₨6900000

But I think you want a link, not the text. If this is the case, read below.

Ok, how to get a link? When you have an anchor a, you get its href (the link) in the table of attibutes: a.attrib["href"].

So as I understand, in the case of the price, you want the text, but in the case of the anchor, you want the value of one specific attributes, href. Here's the real use of lambdas. Rewrite your function like that:

def get_val(item, path, l):
    return l(item.xpath(path)[0]) if item.xpath(path) else ""

The parameter l is a function that is applied to the node. l may return the text of the node, or the href of an anchor:

link = get_val(item,'.//a[contains(@class,"hdrlnk")]', lambda n: n.attrib["href"])
price = get_val(item,'.//span[@class="result-price"]', lambda n: n.text)

Now the output is:

...
https://bangalore.craigslist.co.in/reb/d/residential-plot-sarjapur/6522786441.html ₨1000
https://bangalore.craigslist.co.in/reb/d/prestige-dolce-vita/6522754197.html 
https://bangalore.craigslist.co.in/reb/d/brigade-golden-triangle/6522687904.html ₨12500000
https://bangalore.craigslist.co.in/reb/d/nikoo-homes/6522687772.html ₨6900000

answered Oct 18 '22 05:10

jferard

Related questions
                            
                                Python & Pandas - pd.Series difference between int32 and int64
                            
                                TypeError: float() argument must be a string or a number, not 'function' – Python/Sklearn
                            
                                What is the best PySpark practice to load config from external file
                            
                                Flask socket.io message events in different files
                            
                                Why does get_weights return an empty list?
                            
                                How to add type annotation to asyncio.Task
                            
                                dynamically add periodic tasks celery
                            
                                sigv4-post-example using python
                            
                                Create dynamic level nested dict from a list of objects?
                            
                                Kivy: what is the proper method for animating images with canvas?
                            
                                how to plus integer value in loop
                            
                                How to install Python packages over SSH Port Forwarding?
                            
                                PySpark Window Function: multiple conditions in orderBy on rangeBetween/rowsBetween
                            
                                Start docker with flask application using ssl [duplicate]
                            
                                How to check the status of docker-compose up -d command
                            
                                Getting deep learning tracker (GOTURN) to run opencv python
                            
                                Extract named group regex pattern from a compiled regex in Python
                            
                                Linkedin API - Bad Redirect, invalid redirect URI
                            
                                How do I debug Flask App in VS Code
                            
                                How can axios get the status code in .catch()?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Trouble using lambda function within my scraper

Tags:

python

python-3.x

lambda

web-scraping

SIM

People also ask

1 Answers

jferard

Recent Activity

Donate For Us