I am interested in writing a perl script that goes to the following link and extracts the number 1975: https://familysearch.org/search/collection/results#count=20&query=%2Bevent_place_level_1%3ACalifornia%20%2Bevent_place_level_2%3A%22San%20Diego%22%20%2Bbirth_year%3A1923-1923~%20%2Bgender%3AM%20%2Brace%3AWhite&collection_id=2000219 That website is the amount of white men born in the year 1923 who live in San Diego County, California in 1940. I am trying to do this in a loop structure to generalize over multiple counties and birth years. In the file, locations.txt, I put the list of counties, such as San Diego County. The current code runs, but instead of the # 1975, it displays unknown. The number 1975 should be in $val\n. I would very much appreciate any help! <pre class="prettyprint"><code>#!/usr/bin/perl use strict; use LWP::Simple; open(L, "locations26.txt"); my $url = 'https://familysearch.org/search/collection/results#count=20&query=%2Bevent_place_level_1%3A%22California%22%20%2Bevent_place_level_2%3A%22%LOCATION%%22%20%2Bbirth_year%3A%YEAR%-%YEAR%~%20%2Bgender%3AM%20%2Brace%3AWhite&collection_id=2000219'; open(O, ">out26.txt"); my $oldh = select(O); $| = 1; select($oldh); while (my $location = <L>) { chomp($location); $location =~ s/ /+/g; foreach my $year (1923..1923) { my $u = $url; $u =~ s/%LOCATION%/$location/; $u =~ s/%YEAR%/$year/; #print "$u\n"; my $content = get($u); my $val = 'unknown'; if ($content =~ / of .strong.([0-9,]+)..strong. /) { $val = $1; } $val =~ s/,//g; $location =~ s/\+/ /g; print "'$location',$year,$val\n"; print O "'$location',$year,$val\n"; } } </code></pre> Update: API is not a viable solution. I have been in contact with the site developer. The API does not apply to that part of the webpage. Hence, any solution pertaining to JSON will not be applicbale.

It would appear that your data is generated by Javascript and thus LWP cannot help you. That said, it seems that the site you are interested in has a developer API: https://familysearch.org/developers/ I recommend using Mojo::URL to construct your query and either Mojo::DOM or Mojo::JSON to parse XML or JSON results respectively. Of course other modules will work too, but these tools are very nicely integrated and let you get started quickly.

using Perl to scrape a website

Tags:

perl

web-scraping

I am interested in writing a perl script that goes to the following link and extracts the number 1975: https://familysearch.org/search/collection/results#count=20&query=%2Bevent_place_level_1%3ACalifornia%20%2Bevent_place_level_2%3A%22San%20Diego%22%20%2Bbirth_year%3A1923-1923~%20%2Bgender%3AM%20%2Brace%3AWhite&collection_id=2000219

That website is the amount of white men born in the year 1923 who live in San Diego County, California in 1940. I am trying to do this in a loop structure to generalize over multiple counties and birth years.

In the file, locations.txt, I put the list of counties, such as San Diego County.

The current code runs, but instead of the # 1975, it displays unknown. The number 1975 should be in $val\n.

I would very much appreciate any help!

#!/usr/bin/perl

use strict;

use LWP::Simple;

open(L, "locations26.txt");

my $url = 'https://familysearch.org/search/collection/results#count=20&query=%2Bevent_place_level_1%3A%22California%22%20%2Bevent_place_level_2%3A%22%LOCATION%%22%20%2Bbirth_year%3A%YEAR%-%YEAR%~%20%2Bgender%3AM%20%2Brace%3AWhite&collection_id=2000219';

open(O, ">out26.txt");
 my $oldh = select(O);
 $| = 1;
 select($oldh);
 while (my $location = <L>) {
     chomp($location);
     $location =~ s/ /+/g;
      foreach my $year (1923..1923) {
                 my $u = $url;
                 $u =~ s/%LOCATION%/$location/;
                 $u =~ s/%YEAR%/$year/;
                 #print "$u\n";
                 my $content = get($u);
                 my $val = 'unknown';
                 if ($content =~ / of .strong.([0-9,]+)..strong. /) {
                         $val = $1;
                 }
                 $val =~ s/,//g;
                 $location =~ s/\+/ /g;
                 print "'$location',$year,$val\n";
                 print O "'$location',$year,$val\n";
         }
     }

Update: API is not a viable solution. I have been in contact with the site developer. The API does not apply to that part of the webpage. Hence, any solution pertaining to JSON will not be applicbale.

433

asked Feb 01 '13 20:02

user1690130

1 Answers

It would appear that your data is generated by Javascript and thus LWP cannot help you. That said, it seems that the site you are interested in has a developer API: https://familysearch.org/developers/

I recommend using Mojo::URL to construct your query and either Mojo::DOM or Mojo::JSON to parse XML or JSON results respectively. Of course other modules will work too, but these tools are very nicely integrated and let you get started quickly.

182

answered Oct 11 '22 20:10

Joel Berger

Related questions
                            
                                Bash regex string variable match
                            
                                Perl6 vs Perl5 benchmarking using prime numbers
                            
                                Should Perl hashes always contain values?
                            
                                How to create POD and use pod2usage in perl?
                            
                                Removing files with duplicate content from single directory [Perl, or algorithm]
                            
                                How do I set a ulimit from inside a Perl script that applies to its children?
                            
                                How can I use Moose with Test::Class?
                            
                                How can I redirect STDOUT and STDERR to a log file in Perl? [duplicate]
                            
                                Can I set a single signal handler for all signals in Perl?
                            
                                Can Perl substitution operator match an element in an array?
                            
                                How can I qualify a variable as const/final in Perl?
                            
                                Dynamically/recursively building hashes in Perl?
                            
                                Online Perl POD renderer
                            
                                Perl shallow syntax check? ie. do not check syntax of imports
                            
                                perl s/this/that/r ==> "Bareword found where operator expected"
                            
                                Perl script to parse XML using XML::LibXML;
                            
                                while(<@array>) effects for perl
                            
                                How can I stream JSON from a file?
                            
                                How does Perl's threading system work?
                            
                                Why do '::' and '->' work (sort of) interchangeably when calling methods from Perl modules?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With