Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I scrape data from the Israeli Bureau of Statistics web query tool?

The following url:

http://www.cbs.gov.il/ts/ID40d250e0710c2f/databank/series_func_e_v1.html?level_1=31&level_2=1&level_3=7

Gives a data generator of information from the Israeli government which limits the number of data points extracted to a maximum of 50 series at a time. I wonder, is it possible (and if so, how) to write a webscraper (in your favorite language/software) that can follow the clicks on each step to be able to get all of the series in a specific topic.

Thanks.

like image 866
Tal Galili Avatar asked Dec 16 '22 12:12

Tal Galili


1 Answers

Take a look at WWW::Mechanize and WWW::HtmlUnit.

#!/usr/bin/perl

use strict;
use warnings;

use WWW::Mechanize;

my $m = WWW::Mechanize->new;

#get page
$m->get("http://www.cbs.gov.il/ts/ID40d250e0710c2f/databank/series_func_e_v1.html?level_1=31&level_2=1&level_3=7");

#submit the form on the first page
$m->submit_form(
    with_fields => {
        name_tatser => 2, #Orders for export
    }
);

#now that we have the second page, submit the form on it
$m->submit_form(
    with_fields => {
        name_ser => 1576, #Number of companies that answered
    }
);

#and so on...

#printing the source HTML is a good way
#to find out what you need to do next
print $m->content;
like image 70
Chas. Owens Avatar answered Dec 28 '22 11:12

Chas. Owens