Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

php curl japanese output garbled

Consider following URL: click here

There is some encoding into Japanese characters. Firefox browser on my PC is able to detect it automatically and show the characters. For Chrome, on the other hand, I have to change the encoding manually to "Shift_JIS" to see the japanese characters.

If I try to access the content via PHP-cURL, the encoded text appears garbled like this

���ϕi�̂��ƂȂ��I�݂��Ȃ̃N�`�R�~�T�C�g�������������i�A�b�g�R�X���j�ɂ��܂����I

I tried:

  curl_setopt($ch, CURLOPT_ENCODING, 'Shift_JIS');

I also tried (after downloading the curl response):

  $output_str = mb_convert_encoding($curl_response, 'Shift_JIS', 'auto');
  $output_str = mb_convert_encoding($curl_response, 'SJIS', 'auto');

But that does not work either.

Here is the full code

   curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
    curl_setopt($ch, CURLOPT_HTTPHEADER, array(
        'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
        'Accept-Language: en-US,en;q=0.5',
        'Connection: keep-alive'
    ));

    //curl_setopt($ch, CURLOPT_ENCODING, 'SJIS');
    curl_setopt($ch, CURLOPT_USERAGENT, $useragent);
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
    curl_setopt($ch, CURLOPT_TIMEOUT, 20);
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
    $response = curl_exec($ch);
like image 942
hvs Avatar asked Mar 15 '16 18:03

hvs


1 Answers

That page doesn't return valid HTML, it's actually Javascript. If you fetch it with curl and output it, add header('Content-type: text/html; charset=shift_jis'); to your code and when you load it in Chrome the characters will display properly.

Since the HTML doesn't specify the character set, you can specify it from the server using header().

To actually convert the encoding so it will display properly in your terminal, you can try the following:

Use iconv() to convert to UTF-8

$curl_response = iconv('shift-jis', 'utf-8', $curl_response);

Use mb_convert_encoding() to convert to UTF-8

$curl_response = mb_convert_encoding($curl_response, 'utf-8', 'shift-jis');

Both of those methods worked for me and I was able to see Japanese characters displayed correctly on my terminal.

UTF-8 should be fine, but if you know your system is using something different, you can try that instead.

Hope that helps.

like image 200
drew010 Avatar answered Oct 24 '22 06:10

drew010