Why is this A0 character appearing in my HTML::Element output?

Tags:

I'm parsing an HTML document with a couple Perl modules: HTML::TreeBuilder and HTML::Element. For some reason whenever the content of a tag is just  , which is to be expected, it gets returned by HTML::Element as a strange character I've never seen before:

alt text http://www.freeimagehosting.net/uploads/2acca201ab.jpg

I can't copy the character so can't Google it, couldn't find it in character map, and strangely when I search with a regular expression, \w finds it. When I convert the returned document to ANSI or UTF-8 it disappears altogether. I couldn't find any info on it in the HTML::Element documentation either.

How can I detect and replace this character with something more useful like null and how should I deal with strange characters like this in the future?

708

asked Sep 19 '09 17:09

RobbR

2 Answers

The character is "\xa0" (i.e. 160), which is the standard Unicode translation for  . (That is, it's Unicode's non-breaking space.) You should be able to remove them with s/\xa0/ /g if you like.

answered Sep 23 '22 18:09

chaos

The character is non-breaking space which is what   stands for:

In word processing and digital typesetting, a non-breaking space (" ") (also called no-break space, non-breakable space (NBSP), hard space, or fixed space) is a space character that prevents an automatic line break at its position. In some formats, including HTML, it also prevents consecutive whitespace characters from collapsing into a single space.

In HTML, the common non-breaking space, which is the same width as the ordinary space character, is encoded as or . In Unicode, it is encoded as U+00A0.

answered Sep 20 '22 18:09

Sinan Ünür

Related questions
                            
                                Perl assignment with a dummy placeholder
                            
                                Perl does not complain about missing semicolon
                            
                                Perl Regex - Print the matched value
                            
                                How can I parse dates and convert time zones in Perl?
                            
                                Fastest way to find lines of a file from another larger file in Bash
                            
                                What are some good Perl debugging methods?
                            
                                How can I dynamically include Perl modules without using eval?
                            
                                How can I loop through files in a directory in Perl? [duplicate]
                            
                                Is there an elegant zip to interleave two lists in Perl 5?
                            
                                Obtain a switch/case behaviour in Perl 5
                            
                                Is Perl a compiled or an interpreted programming language?
                            
                                Object-Oriented Perl constructor syntax and named parameters
                            
                                What's the best way to document Perl code? [closed]
                            
                                Perl three dot operator ... examples
                            
                                Perl Breaking out of an If statement
                            
                                Can you force either a scalar or array ref to be an array in Perl?
                            
                                The <DATA> syntax in perl
                            
                                How do I run programs with Strawberry Perl?
                            
                                How do I choose a package name for a custom Perl module that does not collide with builtin or CPAN packages names?
                            
                                How to share/export a global variable between two different perl scripts?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why is this A0 character appearing in my HTML::Element output?

Tags:

encoding

perl

RobbR

People also ask

2 Answers

chaos

Sinan Ünür

Recent Activity

Donate For Us