How can I find extended ASCII characters in a file using Perl? Can anyone get the script?
.....thanks in advance.....
On a standard 101 keyboard, special extended ASCII characters such as é or ß can be typed by holding the ALT key and typing the corresponding 4 digit ASCII code. For example é is typed by holding the ALT key and typing 0233 on the keypad.
Perl | ord() Function The ord() function is an inbuilt function in Perl that returns the ASCII value of the first character of a string. This function takes a character string as a parameter and returns the ASCII value of the first character of this string.
The extended ASCII characters includes the binary values from 128 (1000 0000) through 255 (1111 1111). Unlike standard ASCII characters, there are multiple versions of the extended ASCII character set.
The Extended ASCII character set uses 8-bits, which gives an additional 128 characters (i.e. 256 in total).
Since the extended ASCII characters have value 128 and higher, you can just call ord on individual characters and handle those with a value >= 128. The following code reads from stdin and prints only the extended ASCII characters:
while (<>) {
while (/(.)/g) {
print($1) if (ord($1) >= 128);
}
}
Alternatively, unpack together with chr will also work. Example:
while (<>) {
foreach (unpack("C*", $_)) {
print(chr($_)) if ($_ >= 128);
}
}
(I'm sure some Perl guru can condense both of these to two one-liners...)
To print the line numbers instead, you can use the following (this does not remove duplicates, and will have odd behaviour when unicode is passed):
while (<>) {
while (/(.)/g) {
print($. . "\n") if (ord($1) >= 128);
}
}
(Thanks Yaakov Belch for the $.
tip.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With