Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Recognizing a character to be Chinese and get Chinese "pinyin" phonetics from simplified characters?

Tags:

java

php

cjk

Is it possible to

A. find out if a character is Chinese (simplified) and in that case
B. get the pinyin? example: 你好 => nǐhǎo using java or php?

Cheers

like image 219
Moak Avatar asked Jun 29 '10 18:06

Moak


2 Answers

A)
Yes. All characters represented in unicode have a unique numeric index called a codepoint.

If you know the range of codepoints for simplified Chinese and you know how to get the unicode codepoint of a given character, a simple comparison will tell you if the given character is within the simplified Chinese range.

An existing question has a solution for getting the unicode codepoint for a character in PHP:
How to get code point number for a given character in a utf-8 string?

In Java, the static java.lang.Character::codePointAt() method will give you what you need.

B)
Converting a simplified Chinese character, or string, to Pinyin would most likely require some form of map with the unicode code point as the key and the corresponding pinyin as the value.

An example of this in PHP is shown at http://kingphp.com/108.html.

A simple Google search for [java pinyin] reveals a range of options, two of which being Chinese to pinyin libraries at http://kiang.org/jordan/software/pinyinime/ and http://pinyin4j.sourceforge.net/.

like image 115
Jon Cram Avatar answered Nov 19 '22 15:11

Jon Cram


Bit late, but solved!

<?php

function curl($url,$params = array(),$is_coockie_set = false)
{

if(!$is_coockie_set){
/* STEP 1. let¡¯s create a cookie file */
$ckfile = tempnam ("/tmp", "CURLCOOKIE");

/* STEP 2. visit the homepage to set the cookie properly */
$ch = curl_init ($url);
curl_setopt ($ch, CURLOPT_COOKIEJAR, $ckfile);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, true);
$output = curl_exec ($ch);
}

$str = ''; $str_arr= array();
foreach($params as $key => $value)
{
$str_arr[] = urlencode($key)."=".urlencode($value);
}
if(!empty($str_arr))
$str = '?'.implode('&',$str_arr);

/* STEP 3. visit cookiepage.php */

$Url = $url.$str;

$ch = curl_init ($Url);
curl_setopt ($ch, CURLOPT_COOKIEFILE, $ckfile);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, true);

$output = curl_exec ($ch);
return $output;
}

function Translate($word,$from,$to)
{
$word = urlencode($word);
$url = 'http://translate.google.com/translate_a/t?client=t&text='.$word.'&hl=' . $from . '&sl=' . $from . '&tl=' . $to . '&ie=UTF-8&oe=UTF-8&multires=1&otf=2&pc=1&ssel=0&tsel=0&sc=1';

$name_en = curl($url);
$name_en = explode('"',$name_en);
return $name_en[1];
}
function pinyin($word)
{
$word = urlencode($word);
$url = 'http://translate.google.com/translate_a/t?client=t&text='.$word.'&hl=zh&sl=zh&tl=zh&ie=UTF-8&oe=UTF-8&multires=1&otf=2&pc=1&ssel=0&tsel=0&sc=1';

$name_en = curl($url);
$name_en = explode('"',$name_en);
return str_replace(" ", "", strtolower($name_en[5]));
}
?>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
</head>
<body>
<?php
echo pinyin(urldecode($_GET['phrase']));
?>
</body>
</html>

If you put this at http://www.example.com/foo.php, type in http://www.example.com/foo.php?phrase=你好, and it will give you the pinyin.

Tested, and works.

like image 39
Lucas Avatar answered Nov 19 '22 13:11

Lucas