list of garbage characters like â€™

Q: What does â € œ mean?

Save this answer. Show activity on this post. â€&oelig; is "Mojibake" for “ . You could try to avoid the non-ascii quotes, but that would only delay getting back into trouble.

Tags:

I am using librets to retrieve data form my RETS Server. Somehow librets Encoding method is not working and I am receiving some weird characters in my output. I noticed characters like '’' is replaced with â€™. I am unable to find a fix for librets so i decided to replace such garbage characeters with actual values after downloading data. What I need is a list of such garbage string and their equivalent characters. I googled for this but not found any resource. Can anyone point me to the list of such garbage letters and their actual values or a piece of code which can generate such letter.

thanx

579

asked Aug 19 '12 03:08

ZafarYousafi

1 Answers

Search for the term "UTF-8", because that's what you're seeing.

UTF-8 is a way of representing Unicode characters as a sequence of bytes. ("Unicode characters" are the full range of letters and symbols used all in human languages.) Typically, one Unicode character becomes 1, 2, or 3 bytes in UTF-8. When those bytes (numbers from 0 to 255) are displayed using the character set normally used by Windows, they appear as "garbage" -- in this case, 3 "garbage letters" which are really the 3 bytes of a UTF-8 encoding.

In your example, you started with the smart quote character ’. Its representation in Unicode is the number 8217, or U+2019 (2019 is the hexadecimal for 8217). (Search for "Unicode" for a complete list of Unicode characters and their numbers.) The UTF-8 representation of the number 8217 is the three byte sequence 226, 128, 153. And when you display those three bytes as characters, using the Windows "CP-1252" character encoding (the ordinary way of displaying text on Windows in the USA), they appear as â€™. (Search for "CP-1252" to see a table of bytes and characters.)

I don't have any list for you. But you could make one if you wrote a program in a language that has built-in support for Unicode and UTF-8. All I can do is explain what you are seeing.

If there is a way to tell librets to use UTF-8 when downloading, that might automatically solve your problem. I don't know anything about librets, but now that you know the term "UTF-8" you might be able to make progress.

131

answered Sep 23 '22 06:09

librik

Related questions
                            
                                preg_match and (non-English) Latin characters?
                            
                                Which system component is responsible for binding Unicode ligatures in a Java application?
                            
                                How to convert character encoding from CP932 to UTF-8 in nodejs javascript, using the nodejs-iconv module (or other solution)
                            
                                Comparing unicode code tick mark values
                            
                                json_encode encodes strings with Unicode (copyright) character as null?
                            
                                Java can't see file on file system that contains illegal characters
                            
                                Python: block character will not print
                            
                                Why do tab delimited files take less space than comma separated?
                            
                                What guarantees does C++ make about the ordering of character literals?
                            
                                Characters supported in C++
                            
                                Inno Setup Unicode encoding issue with messages in ISS script
                            
                                using powershell to replace extended ascii character in a text file
                            
                                Haskell IO russian symbols
                            
                                Difficulties inherent in ASCII and Extended ASCII, and Unicode Compatibility?
                            
                                HTML To PDF Turkish Character Problem
                            
                                .NET internal Encoding
                            
                                Who can decode this code?
                            
                                Do I need to use HTML entities when storing data in the database?
                            
                                wchar_t and encoding
                            
                                Get Encoding fails when I build Monodroid project with unshared runtime

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

list of garbage characters like â€™

Tags:

character-encoding

rets

ZafarYousafi

People also ask

1 Answers

librik

Recent Activity

Donate For Us