What is the encoding of Chinese characters on Wikipedia?

Tags:

I was looking at the encoding of Chinese characters on Wikipedia and I'm having trouble figuring out what they are using. For instance "的" is encoded as "%E7%9A%84" (see here). That's three bytes, however none of the encodings described on this page uses three bytes to represent Chinese characters. UTF-8 for instance uses 2 bytes.

I'm basically trying to match these three bytes to an actual character. Any suggestion on what encoding it could be?

231

asked Apr 10 '11 05:04

laurent

2 Answers

The header of a wikipedia page includes this:

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

So the page is UTF-8.

answered Sep 23 '22 23:09

Adam

 >>> c='\xe7\x9a\x84'.decode('utf8') >>> c u'\u7684' >>> print c 的

though Unicode encodes it in 16 bits, utf8 breaks it down to 3 bytes.

198

answered Sep 23 '22 23:09

jcomeau_ictx

Related questions
                            
                                'utf-8' codec can't decode byte 0xa0 in position 4276: invalid start byte
                            
                                Best way to convert a Unicode URL to ASCII (UTF-8 percent-escaped) in Python?
                            
                                How to enclose every cell with double quotes in Google docs spreadsheet
                            
                                How can I make Notepad to save text in UTF-8 without the BOM?
                            
                                How can I detect a malformed UTF-8 string in PHP?
                            
                                Is there any reason to prefer UTF-16 over UTF-8?
                            
                                Python and BeautifulSoup encoding issues [duplicate]
                            
                                Why doesn't SSIS recognize line feed {LF} row delimiter while importing UTF-8 flat file?
                            
                                What is <meta charset="utf-8">?
                            
                                UTF-8 encoding of application.properties attributes in Spring-Boot
                            
                                Number of character cells used by string
                            
                                Javascript export CSV encoding utf-8 issue
                            
                                request.getQueryString() seems to need some encoding
                            
                                How to remove accents in MySQL?
                            
                                PHP: Convert unicode codepoint to UTF-8
                            
                                Set locale to system default UTF-8
                            
                                How to print utf-8 to console with Python 3.4 (Windows 8)?
                            
                                When to use utf8 as a header in py files
                            
                                Range of UTF-8 Characters in C++11 Regex
                            
                                How to identify/delete non-UTF-8 characters in R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is the encoding of Chinese characters on Wikipedia?

Tags:

character-encoding

utf-8

url-encoding

cjk

laurent

People also ask

2 Answers

Adam

jcomeau_ictx

Recent Activity

Donate For Us