Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between LinkExtractor and SgmlLinkExtractor

I'm new to scrapy framework and I've seen some tutorial using LinkExtractors and a few using SgmlLinkExtractor. I've tried searching for the differences/pros-cons for both, but the results haven't been satisfying.

Can someone tell me the difference between both? When should we use the above extractors?

Thanks!

like image 612
Krishh Avatar asked May 17 '16 18:05

Krishh


1 Answers

The problem why you cannot find the references to what SgmlLinkExtractor is, is that it is now deprecated (related changeset). You can find the SgmlLinkExtractor definition here - inside the Scrapy 0.24 docs.

And, you should not be using SgmlLinkExtractor anymore - Scrapy now leaves a single link extractor only - the LxmlLinkExtractor - the one to which the LinkExtractor alias points to.

like image 112
alecxe Avatar answered Sep 28 '22 11:09

alecxe