I was wondering if it is possible to extract the parameters of a JavaScript function with Scrapy, from a code similar to this one:
<script type="text/javascript">
var map;
function initialize() {
var fenway = new google.maps.LatLng(43.2640611,2.9388228);
};
}
</script>
I would like to extract the coordinates 43.2640611
and 2.9388228
.
This is where re()
method would help.
The idea is to locate the script
tag via xpath()
and use re()
to extract the lat
and lng
from the script
tag's contents. Demo from the scrapy shell
:
$ scrapy shell index.html
>>> response.xpath('//script').re(r'new google\.maps\.LatLng\(([0-9.]+),([0-9.]+)\);')
[u'43.2640611', u'2.9388228']
where index.html
contains:
<script type="text/javascript">
var map;
function initialize() {
var fenway = new google.maps.LatLng(43.2640611,2.9388228);
};
}
</script>
Of course, in your case the xpath would not be just //script
.
FYI, new google\.maps\.LatLng\(([0-9.]+),([0-9.]+)\);
regular expression uses the saving groups ([0-9.]+)
to extract the coordinate values.
Also see Using selectors with regular expressions.
Disclaimer: I haven't tried this approach, but here's how I would think about it if I was constrained to using Scrapy and didn't want to parse JavaScript the way alecxe suggested above. This is a finicky, fragile hack :-)
You can try using scrapyjs to execute the JavaScript code from your scrapy crawler. In order to capture those parameters, you'd need to do the following:
More on step 2: Make your fake LatLng function modify the HTML page to expose lat and lng variables so that you could parse them out with Scrapy. Here is some crude code to illustrate:
var LatLng = function LatLng(lat, lng) {
var latDiv = document.createElement("div");
latDiv.id = "extractedLat";
latDiv.innerHtml = lat;
document.body.appendChild(latDiv);
var lngDiv = document.createElement("div");
lngDiv.id = "extractedLng";
lngDiv.innerHtml = lng;
document.body.appendChild(lngDiv);
}
google = {
map: {
LatLng: LatLng
}
};
Overall, this approach sounds a bit painful, but could be fun to try.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With