Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Encoding newline, quotes and special character with Perl Mechanize

I'm trying to develop a PERL program with Mechanize that will allow me to repost content from my website on another website. But I have some problems with encoding :

  • there isn't newline in the other website when I repost my content
  • the quotes are not interpreted
  • the symbol like € are not interpreted too

My website is encoded with UTF-8 and the other website is encoded with ISO-8859-15. Here is a sample of the data on my website and the result posted on the other website :

10 M€ d'encours/10 M? d?encours

here is my PERL program :

#!/usr/bin/perl

use utf8;
use strict;
use warnings;
use WWW::Mechanize;
use HTML::TreeBuilder;
use HTML::TreeBuilder::XPath;

my $mech = WWW::Mechanize->new(
   stack_depth => 0,
   timeout => 10,
);

$mech->get("RecoveredDataFromMyWebsiteUrl"); 
my $tree = HTML::TreeBuilder::XPath->new_from_content($mech->content); 
my $data = $tree->findvalue('/html/body//div[@id="content"]');
$data = Encode::encode("iso-8859-15",$data);

$mech->get("OtherWebsiteFormularUrl"); 
$mech->form_name("formular")->accept_charset('iso-8859-15');# Form Post Emploi
$mech->set_fields(
    content => $data
);
$mech->submit;

open FIC,">output.html"
or die "E/S : $!\n";
my $out = select(FIC5);
print $mech->content;
like image 361
user2504649 Avatar asked Nov 11 '22 05:11

user2504649


1 Answers

I would change a few things about how you are crawling the site, but maybe trying to do this when trying to write to a file when encoding utf8:

my $out_file = 'output.html';
open ( my $fh, ">:encoding(utf8)", $out_file) or die;
like image 128
AgileDan Avatar answered Nov 15 '22 07:11

AgileDan