I'm trying to figure out how to url encode strings, character by character, when all i have are the extended ASCII codes.
For example, for codes below 128, that's pretty simple: The code for char "?" is 63, which is 3F in base 16, so the url encoding of the string "?" is "%3F".
Is it possible to do the same for > 127 char codes? For instance the code for "á" is 225 (E1 in base 16). Is it possible to get from here to the bytes %C3%A1, which constitute the url encoding of "á"? If so, which operations need to be performed?
Edit: I should have been more specific, the character set is (ISO Latin-1). It seems I should also make it clearer that this question is about a formula / way to programmatically do the conversion, not about how to urlencode a char using some library in some language.
UTF-8 extends the ASCII character set to use 8-bit code points, which allows for up to 256 different characters. This means that UTF-8 can represent all of the printable ASCII characters, as well as the non-printable characters.
The standard ASCII character set is only 7 bits, and characters are represented as 8-bit bytes with the most significant bit set to 0. Modern computers almost universally use 8-bit bytes, and the extended ASCII character set includes 127 more 8-bit characters, where the most significant bit is set to 1.
If your encoding of "extended ASCII" is ISO-8859-1, then you're in luck. The first 255 Unicode points (Not UTF-8 encoding) of Unicode follow ISO-8859-1. I.e. á
== U+00E1
.
If you have any other encoding, then you're out of luck. The mapping of characters was arbitrary, so requires a rosetta stone and not calculation.
Once you have a Unicode point, you can relatively easily encode it to UTF-8 using the specification found in https://www.rfc-editor.org/rfc/rfc3629. Without a programming language defined in your question it's out of scope to try to detail that conversion here.
Percent encoding, is then a matter of applying the percent encoding specification to the UTF-8 characters.
Fortunately, most programming languages have inbuilt or 3rd party library for this kind of conversion.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With