I'm trying to use scrapy to download some content for a school project. I would like to get a list of keywords for each page that i can then store in a database. This is what i've got so far.
scrapy shell http://news.nationalgeographic.com/2015/03/150318-pitcairn-marine-reserve-protected-area-ocean-conservation/
>>> response.xpath('//title/text()').extract()
[u'World\u2019s Largest Single Marine Reserve Created in Pacific']
>>> response.xpath("//meta[@name='keywords']")[0].extract()
u'<meta name="keywords" content="ocean life, conservationists, marine biodiversity, marine sanctuaries, wildlife conservation, marine protected areas, mpas, reserves, sanctuaries, ocean conservation">'
What i'd like to do is just extract the content from the meta tag where name='keywords'
Thanks!
Simply add /@content
to extract the content
attribute :
response.xpath("//meta[@name='keywords']/@content")[0].extract()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With