Removing bullet points from a txt file using perl

Question

I am writing a perl script to process a text file. I need to remove bullet points from the text file and create a new one without bullets. When I look at the binary version of the text file, the bullet is stored as a unicode bullet (0xe280a2). How do I remove the bullet from a string.

I have tried the following code:

open($filehandle, '<:encoding(UTF-8)', $filename)
or die "Could not open file '$filename' $!";
while ($row = <$filehandle>) 
{
   @txt_str = split(/\•/, $row);
   $row = join(" ",@txt_str);
}

choroba · Accepted Answer

The backslash doesn't help you here, as the bullet is not a special character in regexes.

If you specify the input is UTF-8, you should search for a UTF-8 bullet. To do so, either prepend

use utf8;

and save your script as UTF-8; or, use

\N{BULLET}

In your case, splitting and joining can be replaced by simple replacement of the bullet by a space:

while (<$filehandle>) {
    s/\N{BULLET}/ /g; # or s/•/ /g under utf8
    print;            # <-- this was missing in your code
}

glezmen · Answer

why not use use a simple s/•/ /g instead of splitting/joining? and you should print the resulted variable ($row in your case) to an other file or stdout, otherwise you won't see the 'unbulleted' version but for this task i'd use sed from the command line, i'm pretty sure it can handle unicode characters too

Removing bullet points from a txt file using perl

Tags:

perl

plenn08

2 Answers

choroba

glezmen

Recent Activity

Donate For Us

Removing bullet points from a txt file using perl

Tags:

perl

plenn08

2 Answers

choroba

glezmen

Related questions

Recent Activity

Donate For Us