Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Trouble using lambda function within my scraper

I've written a script to parse the name and price of certain items from craigslist. The xpath I've defined within my scraper are working ones. The thing is when I try to scrape the items in usual way then applying try/except block I can avoid IndexError when the value of certain price is none. I even tried with customized function to make it work and found success as well.

However, In this below snippet I would like to apply lambda function to kick out IndexError error. I tried but could not succeed.

Btw, when I run the code It neither fetches anything nor throws any error either.

import requests
from lxml.html import fromstring

page = requests.get('http://bangalore.craigslist.co.in/search/rea?s=120').text
tree = fromstring(page)

# I wish to fix this function to make a go
get_val = lambda item,path:item.text if item.xpath(path) else ""

for item in tree.xpath('//li[@class="result-row"]'):
    link = get_val(item,'.//a[contains(@class,"hdrlnk")]')
    price = get_val(item,'.//span[@class="result-price"]')
    print(link,price)
like image 294
SIM Avatar asked Mar 15 '18 13:03

SIM


People also ask

What is lambda function when should we not use it?

We use lambda functions when we require a nameless function for a short period of time. In Python, we generally use it as an argument to a higher-order function (a function that takes in other functions as arguments). Lambda functions are used along with built-in functions like filter() , map() etc.

Should I use lambdas?

Use a Lambda when you need to access several services or do custom processing. As data flows through services, you use Lambdas to run custom code on that data stream.

Can lambdas return values?

Returning a value from a lambda expressionYou can explicitly return a value from the lambda using the qualified return syntax. Otherwise, the value of the last expression is implicitly returned.


1 Answers

First of all, your lambda function get_val returns the text of the item if the path exists, and not the text of the searched node. This is probably not what you want. If want want to return the text content of the (first) element matching the path, you should write:

get_val = lambda item, path: item.xpath(path)[0].text if item.xpath(path) else ""

Please note that xpath returns a list. I assume here that you have only one element in that list.

The output is something like that:

...
Residential Plot @ Sarjapur Check Post ₨1000
Prestige dolce vita apartments in whitefield, Bangalore 
Brigade Golden Triangle, ₨12500000
Nikoo Homes, ₨6900000

But I think you want a link, not the text. If this is the case, read below.

Ok, how to get a link? When you have an anchor a, you get its href (the link) in the table of attibutes: a.attrib["href"].

So as I understand, in the case of the price, you want the text, but in the case of the anchor, you want the value of one specific attributes, href. Here's the real use of lambdas. Rewrite your function like that:

def get_val(item, path, l):
    return l(item.xpath(path)[0]) if item.xpath(path) else ""

The parameter l is a function that is applied to the node. l may return the text of the node, or the href of an anchor:

link = get_val(item,'.//a[contains(@class,"hdrlnk")]', lambda n: n.attrib["href"])
price = get_val(item,'.//span[@class="result-price"]', lambda n: n.text)

Now the output is:

...
https://bangalore.craigslist.co.in/reb/d/residential-plot-sarjapur/6522786441.html ₨1000
https://bangalore.craigslist.co.in/reb/d/prestige-dolce-vita/6522754197.html 
https://bangalore.craigslist.co.in/reb/d/brigade-golden-triangle/6522687904.html ₨12500000
https://bangalore.craigslist.co.in/reb/d/nikoo-homes/6522687772.html ₨6900000
like image 55
jferard Avatar answered Oct 18 '22 05:10

jferard