From the MongoDB manual:
By default, all database strings are UTF8. To save images, binaries, and other non-UTF8 data, you can pass the string as a reference to the database.
I'm fetching pages and want store the content for later processing.
Encode
(don't know the originating charset)as flow of bytes
(binary data) for later processingFragment of my code:
sub save {
my ($self, $ok, $url, $fetchtime, $request ) = @_;
my $rawhead = $request->headers_as_string;
my $rawbody = $request->content;
$self->db->content->insert(
{ "url" => $url, "rhead" => \$rawhead, "rbody" => \$rawbody } ) #using references here
if $ok;
$self->db->links->update(
{ "url" => $url },
{
'$set' => {
'status' => $request->code,
'valid' => $ok,
'last_checked' => time(),
'fetchtime' => $fetchtime,
}
}
);
}
But get error:
Wide character in subroutine entry at /opt/local/lib/perl5/site_perl/5.14.2/darwin-multi-2level/MongoDB/Collection.pm line 296.
This is the only place where I storing data.
The question: The only way store binary data in MondoDB is encode them e.g. with base64?
It looks like another sad story about _utf8_
flag...
I may be wrong, but it seems that headers_as_string
and content
methods of HTTP::Message return their strings as a sequence of characters. But MongoDB driver expects the strings explicitly passed to it as 'binaries' to be a sequence of octets - hence the warning drama.
A rather ugly fix is to take down the utf8
flag on $rawhead and $rawbody in your code (I wonder shouldn't it be really done by MongoDB driver itself?), by something like this...
_utf8_off $rawhead;
_utf8_off $rawbody; # ugh
The alternative is to use encode('utf8', $rawhead)
- but then you should use decode
when extracting values from DB, and I doubt it's not uglier.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With