How to convert any character encoding to UTF8 on PHP

Question

I'm working on a web crawler that grabs data from sites all over the world, and is dealing with distinct languages and encodings.

Currently I'm using the following function, and it works in 99% of the cases. But there is this 1% that is giving me headaches.

function convertEncoding($str) {
    return iconv(mb_detect_encoding($str), "UTF-8", $str);
}

sagi · Accepted Answer

Rather than blindly trying to detect the encoding, you should first check if the page that you downloaded has a listed character set. The character set may be set in the HTTP response header, for example:

Content-Type:text/html; charset=utf-8

Or in the HTML as a meta tag, for example:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

Only if neither are available then try to guess the encoding with mb_detect_encoding() or other methods.

How to convert any character encoding to UTF8 on PHP

Tags:

php

encoding

utf-8

rafaschutz

1 Answers

sagi

Recent Activity

Donate For Us

How to convert any character encoding to UTF8 on PHP

Tags:

php

encoding

utf-8

rafaschutz

1 Answers

sagi

Related questions

Recent Activity

Donate For Us