I am writing a perl script to process a text file. I need to remove bullet points from the text file and create a new one without bullets. When I look at the binary version of the text file, the bullet is stored as a unicode bullet (0xe280a2). How do I remove the bullet from a string.
I have tried the following code:
open($filehandle, '<:encoding(UTF-8)', $filename)
or die "Could not open file '$filename' $!";
while ($row = <$filehandle>)
{
@txt_str = split(/\•/, $row);
$row = join(" ",@txt_str);
}
The backslash doesn't help you here, as the bullet is not a special character in regexes.
If you specify the input is UTF-8, you should search for a UTF-8 bullet. To do so, either prepend
use utf8;
and save your script as UTF-8; or, use
\N{BULLET}
In your case, splitting and joining can be replaced by simple replacement of the bullet by a space:
while (<$filehandle>) {
s/\N{BULLET}/ /g; # or s/•/ /g under utf8
print; # <-- this was missing in your code
}
why not use use a simple s/•/ /g instead of splitting/joining? and you should print the resulted variable ($row in your case) to an other file or stdout, otherwise you won't see the 'unbulleted' version but for this task i'd use sed from the command line, i'm pretty sure it can handle unicode characters too
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With