Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can not read Japanese content from Wiki

I am trying to use below code to read the Japanese content from Wiki

ClientConfig clientConfig = new DefaultClientConfig();
client = Client.create(clientConfig);
WebResource webResource = client.resource("http://ja.wikipedia.org/w/api.php?format=json&action=query&titles=AKB48&rvprop=content&prop=revisions");
String s  = webResource.get(String.class);          
System.out.println(s);

And the result is like

{"query":{"pages":{"2276803":{"pageid":2276803,"ns":0,"title":"AKB48","revisions":[{"contentformat":"text/x-wiki","contentmodel":"wikitext","*":"{{Otheruseslist|\u65e5\u672c\u306e\u5973\u6027\u30a2\u30a4\u30c9\u30eb\u30b0\u30eb\u30fc\u30d7....

It shows \uxxx rather than real Japanese words. I know it must be encoding issue. But I still can not make it works.

Any help will be very appreciated.

like image 743
newhand Avatar asked Jan 28 '26 19:01

newhand


1 Answers

That looks like entirely reasonable JSON to me. Like Java, JSON uses \u escape sequences to represent characters. I don't think this is an encoding issue at all.

I suggest you find a JSON parser with an API you like, plug the string into that, and then you'll be able to fetch the "unescaped" values.

like image 115
Jon Skeet Avatar answered Jan 30 '26 10:01

Jon Skeet



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!