Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Codeigniter and charsets

I'm using Codeigniter not for so long but I've some charset problems.. I'm asking around at the CI Forum, but I want to go further, still no global solution: http://codeigniter.com/forums/viewthread/204409/

The problem was a database error 1064. I've got a solution, use iconv! Works fine, but I think it's not necessary. I'm searching a lot on the internet for charset's etc but I'm using CI now, how about charsets and CI...

So I've a lot of question about it, I hope someone can make it clear for me:

What’s the best way to set the charset global? And what to set?

  • In the head

    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

  • In config/config.php

    $config['charset'] = 'UTF-8';

  • In config/database.php

    $db['default']['char_set'] = 'utf8';

    $db['default']['dbcollat'] = 'utf8_general_ci';

  • In .htaccess, my rewrite rules and

    php_value magic_quotes_gpc Off

    AddDefaultCharset UTF-8

  • Also need send a header? Where to place? Something like?

    header('Content-Type: text/html; charset=UTF-8');

  • In my editor (Notepad++) save files as UTF-8? Or UTF-8 (without BOM)? Or is ANSI good (this is what I’m using now)?

  • Use utf8_unicode_ci or utf8_general_ci for the MySQL database? And why?

  • How about reading RSS feeds, how to handle multiple charsets? Where I’m working on I’ve two feeds, one with UTF-8 encoding and the other with ISO-8859-1. This will be stored in the database and will be compared sometimes to see if there are new items. It fails on special chars.

I'm working with: - CI 2.0.3 - PHP 5.2.17 - MySQL 5.1.58

More information added:

Model:

function update_favorite($data) 
{
 $this->db->where('id', $data['id']);
 $this->db->where('user_id', $data['user_id']);
 $this->db->update('favorites', $data);
 return;
}

Controller:

$this->favorites_model->update_favorite(array(
 'id' => $id, 
 'rss_last' => $rss_last,
 'user_id' => $this->session->userdata('user_id')
)); 

When $rss_last is a “normal” value like: “test” (without quotes) it works fine. When it’s a value with more length like (in Dutch): F-Secure vindt malware met certificaat van Maleisische overheid

I get this error:

Error Number: 1064

You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ‘vindt malware met certificaat van Maleisische overheid, user_id = ‘1’ WHERE `i’ at line 1

UPDATE favorites SET id = ‘15’, rss_last = F-Secure vindt malware met certificaat van Maleisische overheid, user_id = ‘1’ WHERE id = ‘15’ AND user_id = ‘1’

Filename: /home/.../domains/....nl/public_html/new/models/favorites_model.php

Line Number: 35

Someone at the CI forum told me to use this:

'rss_last' => iconv("UTF-8", "UTF-8//TRANSLIT", $rss_last) 

This works fine, but I think this is not necessary..

The value $rss_last came out a RSS feed, as told before, sometimes a UTF-8 and other times a ISO-8859-1 encoding:

$rss = file_get_contents('http://www.website.com/rss.xml');
$feed = new SimpleXmlElement($rss);
$rss_last = $feed->channel->item[0]->title;

It looks like this last part is the problem, when $rss_last is set to the value it works fine:

$rss_last = 'F-Secure vindt malware met certificaat van Maleisische overheid';

When the value came out the RSS it give problems...

Some more questions..

Just found this: Detect encoding and make everything UTF-8

Best solution? But.. is iconv not more simple, do something like this:

$encoding = some_function_to_get_encoding_from_feed($feed);
$rss_last = iconv($encoding, "UTF-8//TRANSLIT", $feed->channel->item[0]->title);

But what to use for "some_function_to_get_encoding_from_feed"? mb_detect_encoding?

And mb_convert_encoding vs iconv?

like image 458
Roy Avatar asked Nov 16 '11 18:11

Roy


1 Answers

1) There is no global solution.

2)

AddDefaultCharset UTF-8

It's needed for Apache response to client with right encoding. Make it.

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

not necessarily, but recommended by W3C.

$config['charset'] = 'UTF-8';

it's desirable

$db['default']['char_set'] = 'utf8';
$db['default']['dbcollat'] = 'utf8_general_ci';

Encoding for CI connection to database. If encoding of your database is UTF-8 - make it mandatory.

header('Content-Type: text/html; charset=UTF-8');

Do not do this unless necessary. Charset already indicated in HTML code and .htaccess.

Use utf8_unicode_ci or utf8_general_ci for the MySQL database? And why?

For their own language (Russian), I use utf8_general_ci.

In my editor (Notepad++) save files as UTF-8?

Absolutely! All code that Apache will give as UTF8 should be in UTF8.

How about reading RSS feeds, how to handle multiple charsets?

If you have each RSS in each table - you can specify charset for each table and set right encoding with each sql query. Yes, cyrillic symbols, for example, will fails on non-UTF8.

like image 176
Nikolay Baluk Avatar answered Sep 21 '22 14:09

Nikolay Baluk