So I'm trying to write a perl script to read in a file encoded in Latin-1. For some reason, this just isn't working out. When I try to do a simple search for a character that I know is in the file (it's in the first line), nothing shows up. I'm using use encoding "iso 8859-1"; below, but I've also tried binmode(STDIN, ":utf8");. Any suggestions on what I might be doing wrong, and how to make it right?
use encoding "iso 8859-1";
while(<>)
{
if(/ó/gi)
{
print "Found one!\n";
}
}
Latin-1, also called ISO-8859-1, is an 8-bit character set endorsed by the International Organization for Standardization (ISO) and represents the alphabets of Western European languages.
UTF-8 is a multibyte encoding that can represent any Unicode character. ISO 8859-1 is a single-byte encoding that can represent the first 256 Unicode characters. Both encode ASCII exactly the same way.
$octets = encode_utf8($string); Equivalent to $octets = encode("utf8", $string); The characters that comprise $string are encoded in Perl's internal format and the result is returned as a sequence of octets. All possible characters have a UTF-8 representation so this function cannot fail.
Don’t use the use encoding
pragma: it’s broken.
Either specify the encoding here:
use open ":encoding(Latin1)";
or put it in the open itself:
open(FH, "< :encoding(Latin1)", $pathname)
|| die "can't open $pathname: $!";
or binmode
it after opening:
binmode(FH, ":encoding(Latin1)")
|| die "can't binmode to encoding Latin1";
If you’re using <ARGV>
, then use open
is probably easiest.
Don’t forget to set the encoding on your output streams, too.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With