Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing bullet points from a txt file using perl

Tags:

perl

I am writing a perl script to process a text file. I need to remove bullet points from the text file and create a new one without bullets. When I look at the binary version of the text file, the bullet is stored as a unicode bullet (0xe280a2). How do I remove the bullet from a string.

I have tried the following code:

open($filehandle, '<:encoding(UTF-8)', $filename)
or die "Could not open file '$filename' $!";
while ($row = <$filehandle>) 
{
   @txt_str = split(/\•/, $row);
   $row = join(" ",@txt_str);
}
like image 705
plenn08 Avatar asked Mar 18 '23 18:03

plenn08


2 Answers

The backslash doesn't help you here, as the bullet is not a special character in regexes.

If you specify the input is UTF-8, you should search for a UTF-8 bullet. To do so, either prepend

use utf8;

and save your script as UTF-8; or, use

\N{BULLET}

In your case, splitting and joining can be replaced by simple replacement of the bullet by a space:

while (<$filehandle>) {
    s/\N{BULLET}/ /g; # or s/•/ /g under utf8
    print;            # <-- this was missing in your code
}
like image 199
choroba Avatar answered Apr 01 '23 06:04

choroba


why not use use a simple s/•/ /g instead of splitting/joining? and you should print the resulted variable ($row in your case) to an other file or stdout, otherwise you won't see the 'unbulleted' version but for this task i'd use sed from the command line, i'm pretty sure it can handle unicode characters too

like image 43
glezmen Avatar answered Apr 01 '23 06:04

glezmen