How to fix UTF encoding for whitespaces?

Tags:

In my C# code, I am extracting text from a PDF document. When I do that, I get a string that's in UTF-8 or Unicode encoding (I'm not sure which). When I use Encoding.UTF8.GetBytes(src); to convert it into a byte array, I notice that the whitespace is actually two characters with byte values of 194 and 160.

For example the string "CLE action" looks like

[67, 76, 69, 194 ,160, 65 ,99, 116, 105, 111, 110]

in a byte array, where the whitespace is 194 and 160... And because of this src.IndexOf("CLE action"); is returning -1 when I need it to return 1.

How can I fix the encoding of the string?

398

asked Dec 21 '12 15:12

omega

1 Answers

194 160 is the UTF-8 encoding of a NO-BREAK SPACE codepoint (the same codepoint that HTML calls  ).

So it's really not a space, even though it looks like one. (You'll see it won't word-wrap, for instance.) A regular expression match for \s would match it, but a plain comparison with a space won't.

To simply replace NO-BREAK spaces you can do the following:

src = src.Replace('\u00A0', ' ');

129

answered Sep 21 '22 09:09

RichieHindle

Related questions
                            
                                Using a gradientDrawable with more than three colors set
                            
                                why isn't the copy constructor called [duplicate]
                            
                                unnamed namespace within named namespace
                            
                                Use Promise and service together in Angular
                            
                                How does strchr implementation work
                            
                                Problems with corrupt git repo
                            
                                Using radio buttons for tab control using bootstrap
                            
                                Set a time-out for android requestSingleUpdate
                            
                                handling window close button in wpf MVVM
                            
                                Is dispatchEvent a sync or an async function
                            
                                In python, if a function doesn't have a return statement, what does it return?
                            
                                Play Framework 2 best way to store password hash of user

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With