Convert extended ASCII character codes to utf-8 byte codes

Tags:

I'm trying to figure out how to url encode strings, character by character, when all i have are the extended ASCII codes.

For example, for codes below 128, that's pretty simple: The code for char "?" is 63, which is 3F in base 16, so the url encoding of the string "?" is "%3F".

Is it possible to do the same for > 127 char codes? For instance the code for "á" is 225 (E1 in base 16). Is it possible to get from here to the bytes %C3%A1, which constitute the url encoding of "á"? If so, which operations need to be performed?

Edit: I should have been more specific, the character set is (ISO Latin-1). It seems I should also make it clearer that this question is about a formula / way to programmatically do the conversion, not about how to urlencode a char using some library in some language.

492

asked Mar 08 '16 22:03

Diogo Franco

1 Answers

If your encoding of "extended ASCII" is ISO-8859-1, then you're in luck. The first 255 Unicode points (Not UTF-8 encoding) of Unicode follow ISO-8859-1. I.e. á == U+00E1.

If you have any other encoding, then you're out of luck. The mapping of characters was arbitrary, so requires a rosetta stone and not calculation.

Once you have a Unicode point, you can relatively easily encode it to UTF-8 using the specification found in https://www.rfc-editor.org/rfc/rfc3629. Without a programming language defined in your question it's out of scope to try to detail that conversion here.

Percent encoding, is then a matter of applying the percent encoding specification to the UTF-8 characters.

Fortunately, most programming languages have inbuilt or 3rd party library for this kind of conversion.

110

answered Sep 29 '22 07:09

Alastair McCormack

Related questions
                            
                                Can anyone tell me what encoding this is?
                            
                                Default Encoding and changes
                            
                                Why does this postgres stored procedure want to `use utf8`?
                            
                                Reading arabic text encoded in utf-8 in python
                            
                                Should webhook JSON payloads be URL encoded?
                            
                                Why does zmq.setsockopt_string complain about default 'ascii' code?
                            
                                How should I decode bytes (using ASCII) without losing any "junk" bytes if xmlcharrefreplace and backslashreplace don't work?
                            
                                Will String.getBytes("UTF-16") return the same result on all platforms?
                            
                                Encrypting files with SJCL client-side
                            
                                Does the multibyte-to-wide-string conversion function "mbstowcs", when passed a string literal, use the encoding of the source file?
                            
                                What is the best suited encoding for C++ source code
                            
                                How do I load a UTF16-encoded text file in Julia?
                            
                                Java bean validation alternatives to OWASP ESAPI
                            
                                MediaCodec Encoded video has green bar at bottom and chrominance screwed up
                            
                                How to properly set utf8 encoding with jdbc and MySQL?
                            
                                Interop.Excel UTF-8 encoding when saving file
                            
                                Laravel 5 route pagination url encoding issue
                            
                                boost::filesystem::path vs boost::filesystem::wpath
                            
                                Emoji in R [UTF-8 encoding]
                            
                                C# and HtmlAgilityPack encoding problem

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Convert extended ASCII character codes to utf-8 byte codes

Tags:

char

character-encoding

encoding

ascii

utf-8

Diogo Franco

People also ask

1 Answers

Alastair McCormack

Recent Activity

Donate For Us