I want to retrieve the value of the var modelCode
. I made a regex function like this, but it doesn't work at all. I've posted the structure of the page below.Can somebody help me, please?
regex2 = re.compile(r'"var modelCode"\s*:\s*(.+?\})', re.DOTALL)
source_json3 = response.xpath("//script[contains(., 'if(pageTrackName == 'product detail' || pageTrackName == 'generic product details')')]/text()").re_first(regex2)
source_json3 = re.sub(r'//[^\n]+', "", source_json3)
Structure of the page:
var pageTrackName = digitalData.page.pageInfo.pageTrack;
if(pageTrackName == "product detail" || pageTrackName == "generic product details"){
var modelCode = "GT-P5100TSABTU";
var displayName = "Galaxy Tab 2 (10.1, 3G)".replace(/(<([^>]+)>)/gi, "");
digitalData.product.model_code = modelCode;
digitalData.product.displayName = displayName;
pageName += ":" + modelCode;
}
That code is inside a <script>
tag, I suppose. In that case, you could use:
model_code = response.xpath('//script').re_first('modelCode.*?"(.*)"')
Some tips:
.re_first()
/.re()
.parsel
(scrapy
's libraty to extract data from xml): https://parsel.readthedocs.io/en/latest/usage.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With