I've written a script to parse the name and price of certain items from craigslist. The xpath
I've defined within my scraper are working ones. The thing is when I try to scrape the items in usual way then applying try/except
block I can avoid IndexError
when the value of certain price is none. I even tried with customized function to make it work and found success as well.
However, In this below snippet I would like to apply lambda
function to kick out IndexError
error. I tried but could not succeed.
Btw, when I run the code It neither fetches anything nor throws any error either.
import requests
from lxml.html import fromstring
page = requests.get('http://bangalore.craigslist.co.in/search/rea?s=120').text
tree = fromstring(page)
# I wish to fix this function to make a go
get_val = lambda item,path:item.text if item.xpath(path) else ""
for item in tree.xpath('//li[@class="result-row"]'):
link = get_val(item,'.//a[contains(@class,"hdrlnk")]')
price = get_val(item,'.//span[@class="result-price"]')
print(link,price)
We use lambda functions when we require a nameless function for a short period of time. In Python, we generally use it as an argument to a higher-order function (a function that takes in other functions as arguments). Lambda functions are used along with built-in functions like filter() , map() etc.
Use a Lambda when you need to access several services or do custom processing. As data flows through services, you use Lambdas to run custom code on that data stream.
Returning a value from a lambda expressionYou can explicitly return a value from the lambda using the qualified return syntax. Otherwise, the value of the last expression is implicitly returned.
First of all, your lambda function get_val
returns the text of the item if the path exists, and not the text of the searched node. This is probably not what you want. If want want to return the text content of the (first) element matching the path, you should write:
get_val = lambda item, path: item.xpath(path)[0].text if item.xpath(path) else ""
Please note that xpath
returns a list. I assume here that you have only one element in that list.
The output is something like that:
...
Residential Plot @ Sarjapur Check Post ₨1000
Prestige dolce vita apartments in whitefield, Bangalore
Brigade Golden Triangle, ₨12500000
Nikoo Homes, ₨6900000
But I think you want a link, not the text. If this is the case, read below.
Ok, how to get a link? When you have an anchor a
, you get its href
(the link) in the table of attibutes: a.attrib["href"]
.
So as I understand, in the case of the price, you want the text, but in the case of the anchor, you want the value of one specific attributes, href. Here's the real use of lambdas. Rewrite your function like that:
def get_val(item, path, l):
return l(item.xpath(path)[0]) if item.xpath(path) else ""
The parameter l
is a function that is applied to the node. l
may return the text of the node, or the href of an anchor:
link = get_val(item,'.//a[contains(@class,"hdrlnk")]', lambda n: n.attrib["href"])
price = get_val(item,'.//span[@class="result-price"]', lambda n: n.text)
Now the output is:
...
https://bangalore.craigslist.co.in/reb/d/residential-plot-sarjapur/6522786441.html ₨1000
https://bangalore.craigslist.co.in/reb/d/prestige-dolce-vita/6522754197.html
https://bangalore.craigslist.co.in/reb/d/brigade-golden-triangle/6522687904.html ₨12500000
https://bangalore.craigslist.co.in/reb/d/nikoo-homes/6522687772.html ₨6900000
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With