Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extracting keywords from metatag using scrapy

I'm trying to use scrapy to download some content for a school project. I would like to get a list of keywords for each page that i can then store in a database. This is what i've got so far.

scrapy shell http://news.nationalgeographic.com/2015/03/150318-pitcairn-marine-reserve-protected-area-ocean-conservation/

>>> response.xpath('//title/text()').extract()

[u'World\u2019s Largest Single Marine Reserve Created in Pacific']

>>> response.xpath("//meta[@name='keywords']")[0].extract()

u'<meta name="keywords" content="ocean life, conservationists, marine biodiversity, marine sanctuaries, wildlife conservation, marine protected areas, mpas, reserves, sanctuaries, ocean conservation">'

What i'd like to do is just extract the content from the meta tag where name='keywords'

Thanks!

like image 260
Nancy Avatar asked Mar 26 '16 20:03

Nancy


1 Answers

Simply add /@content to extract the content attribute :

response.xpath("//meta[@name='keywords']/@content")[0].extract()
like image 133
har07 Avatar answered Sep 18 '22 18:09

har07