Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to encode Japanese characters

I have to develop a program. This is encoding system.

I have this Japanese characters that are:

つれづれなるまゝに、日暮らし、硯にむかひて、心にうつりゆくよしなし事を、そこはかとなく書きつくれば、あやしうこそものぐるほしけれ

I want to convert this string to encoding like this:

%26%2312388%3B%26%2312428%3B%26%2312389%3B%26%2312428%3B%26%2312394%3B%26%2312427%3B%26%2312414%3B%26%2312445%3B%26%2312395%3B%26%2312289%3B%26%2326085%3B%26%2326286%3B%26%2312425%3B%26%2312375%3B%26%2312289%3B%26%2330831%3B%26%2312395%3B%26%2312416%3B%26%2312363%3B%26%2312402%3B%26%2312390%3B%26%2312289%3B%26%2324515%3B%26%2312395%3B%26%2312358%3B%26%2312388%3B%26%2312426%3B%26%2312422%3B%26%2312367%3B%26%2312424%3B%26%2312375%3B%26%2312394%3B%26%2312375%3B%26%2320107%3B%26%2312434%3B%26%2312289%3B%26%2312381%3B%26%2312371%3B%26%2312399%3B%26%2312363%3B%26%2312392%3B%26%2312394%3B%26%2312367%3B%26%2326360%3B%26%2312365%3B%26%2312388%3B%26%2312367%3B%26%2312428%3B%26%2312400%3B%26%2312289%3B%26%2312354%3B%26%2312420%3B%26%2312375%3B%26%2312358%3B%26%2312371%3B%26%2312381%3B%26%2312418%3B%26%2312398%3B%26%2312368%3B%26%2312427%3B%26%2312411%3B%26%2312375%3B%26%2312369%3B%26%2312428%3B%26%2312290%3B.

How can I do that?

like image 400
zanhtet Avatar asked Dec 17 '22 12:12

zanhtet


2 Answers

I believe you are looking for HttpUtility.UrlEncode, can't figure out the encoding to get exactly the same output that you show.

var testString = "つれづれなるまゝに、日暮らし、硯にむかひて、心にうつりゆくよしなし事を、そこはかとなく書きつくれば、あやしうこそものぐるほしけれ。";
var encodedUrl = HttpUtility.UrlEncode(testString, Encoding.UTF8);

You might want to change your question, as you don't really need to convert Unicode to ASCII, which is impossible. You rather need to Persent encode or URL encode Percent-encoding.

[EDIT]

I figured it out:

var testString = "つれづれなるまゝに、日暮らし、硯にむかひて、心にうつりゆくよしなし事を、そこはかとなく書きつくれば、あやしうこそものぐるほしけれ。";
var htmlEncoded = string.Concat(testString.Select(arg => string.Format("&#{0};", (int)arg)));
var result = HttpUtility.UrlEncode(htmlEncoded);

The result will exactly match to the encoding you that you provided. Step by step:

var inputChar = 'つ';
var charValue = (int)inputChar; // 12388
var htmlEncoded = "&#" + charValue + ";"; // つ
var ulrEncoded = HttpUtility.UrlEncode(htmlEncoded); // %26%2312388%3b
like image 121
Alex Aza Avatar answered Dec 30 '22 01:12

Alex Aza


This is impossible. Unicode is so much larger than ASCII and you can't look up every character from Unicode in ASCII. while ASCII is 256 characters only (with control chars), Unicode is tens of thousands (I guess).

like image 35
Ken D Avatar answered Dec 30 '22 00:12

Ken D