Im doing some data cleansing on some messy data which is being imported into mysql. The data contains 'pseudo' unicode chars, which are actually embedded into the strings as 'u00e9' etc. So one field might be.. 'Jalostotitlu00e1n' I need to rip out that clumsy 'u00e1n' and replace it with the corresponding utf character I can do this in either mysql, using substring and CHR maybe, but Im preprocssing the data via PHP, so I could do it there also. I already know all about how to configure mysql and php to work with utf data. The problem is really just in the source data Im importing. Thanks

/* Function php for convert utf8 html to ansi */ <pre class="prettyprint"><code>public static function Utf8_ansi($valor='') { $utf8_ansi2 = array( "\u00c0" =>"À", "\u00c1" =>"Á", "\u00c2" =>"Â", "\u00c3" =>"Ã", "\u00c4" =>"Ä", "\u00c5" =>"Å", "\u00c6" =>"Æ", "\u00c7" =>"Ç", "\u00c8" =>"È", "\u00c9" =>"É", "\u00ca" =>"Ê", "\u00cb" =>"Ë", "\u00cc" =>"Ì", "\u00cd" =>"Í", "\u00ce" =>"Î", "\u00cf" =>"Ï", "\u00d1" =>"Ñ", "\u00d2" =>"Ò", "\u00d3" =>"Ó", "\u00d4" =>"Ô", "\u00d5" =>"Õ", "\u00d6" =>"Ö", "\u00d8" =>"Ø", "\u00d9" =>"Ù", "\u00da" =>"Ú", "\u00db" =>"Û", "\u00dc" =>"Ü", "\u00dd" =>"Ý", "\u00df" =>"ß", "\u00e0" =>"à", "\u00e1" =>"á", "\u00e2" =>"â", "\u00e3" =>"ã", "\u00e4" =>"ä", "\u00e5" =>"å", "\u00e6" =>"æ", "\u00e7" =>"ç", "\u00e8" =>"è", "\u00e9" =>"é", "\u00ea" =>"ê", "\u00eb" =>"ë", "\u00ec" =>"ì", "\u00ed" =>"í", "\u00ee" =>"î", "\u00ef" =>"ï", "\u00f0" =>"ð", "\u00f1" =>"ñ", "\u00f2" =>"ò", "\u00f3" =>"ó", "\u00f4" =>"ô", "\u00f5" =>"õ", "\u00f6" =>"ö", "\u00f8" =>"ø", "\u00f9" =>"ù", "\u00fa" =>"ú", "\u00fb" =>"û", "\u00fc" =>"ü", "\u00fd" =>"ý", "\u00ff" =>"ÿ"); return strtr($valor, $utf8_ansi2); } </code></pre>

There's a way. Replace all <code>uXXXX</code> with their HTML representation and do an <code>html_entity_decode()</code> I.e. <code>echo html_entity_decode("Jalostotitl&#x00e1;n");</code> Every UTF character in the form <code>u1234</code> could be printed in HTML as <code>&#x1234;</code>. But doing a replace is quite hard, because there could be much false positives if there is no other char that identifies the beginning of an UTF sequence. A simple regex could be <code>preg_replace('/u([\da-fA-F]{4})/', '&#x\1;', $str)</code>

How to convert 'u00e9' into a utf8 char, in mysql or php?

3 Answers

/* Function php for convert utf8 html to ansi */

public static function Utf8_ansi($valor='') {      $utf8_ansi2 = array(     "\u00c0" =>"À",     "\u00c1" =>"Á",     "\u00c2" =>"Â",     "\u00c3" =>"Ã",     "\u00c4" =>"Ä",     "\u00c5" =>"Å",     "\u00c6" =>"Æ",     "\u00c7" =>"Ç",     "\u00c8" =>"È",     "\u00c9" =>"É",     "\u00ca" =>"Ê",     "\u00cb" =>"Ë",     "\u00cc" =>"Ì",     "\u00cd" =>"Í",     "\u00ce" =>"Î",     "\u00cf" =>"Ï",     "\u00d1" =>"Ñ",     "\u00d2" =>"Ò",     "\u00d3" =>"Ó",     "\u00d4" =>"Ô",     "\u00d5" =>"Õ",     "\u00d6" =>"Ö",     "\u00d8" =>"Ø",     "\u00d9" =>"Ù",     "\u00da" =>"Ú",     "\u00db" =>"Û",     "\u00dc" =>"Ü",     "\u00dd" =>"Ý",     "\u00df" =>"ß",     "\u00e0" =>"à",     "\u00e1" =>"á",     "\u00e2" =>"â",     "\u00e3" =>"ã",     "\u00e4" =>"ä",     "\u00e5" =>"å",     "\u00e6" =>"æ",     "\u00e7" =>"ç",     "\u00e8" =>"è",     "\u00e9" =>"é",     "\u00ea" =>"ê",     "\u00eb" =>"ë",     "\u00ec" =>"ì",     "\u00ed" =>"í",     "\u00ee" =>"î",     "\u00ef" =>"ï",     "\u00f0" =>"ð",     "\u00f1" =>"ñ",     "\u00f2" =>"ò",     "\u00f3" =>"ó",     "\u00f4" =>"ô",     "\u00f5" =>"õ",     "\u00f6" =>"ö",     "\u00f8" =>"ø",     "\u00f9" =>"ù",     "\u00fa" =>"ú",     "\u00fb" =>"û",     "\u00fc" =>"ü",     "\u00fd" =>"ý",     "\u00ff" =>"ÿ");      return strtr($valor, $utf8_ansi2);        }

188

answered Sep 20 '22 12:09

Sergio-MA-Brazil

There's a way. Replace all uXXXX with their HTML representation and do an html_entity_decode()

I.e. echo html_entity_decode("Jalostotitlán");

Every UTF character in the form u1234 could be printed in HTML as ሴ. But doing a replace is quite hard, because there could be much false positives if there is no other char that identifies the beginning of an UTF sequence. A simple regex could be

preg_replace('/u([\da-fA-F]{4})/', '&#x\1;', $str)

answered Sep 16 '22 12:09

rabudde

My twitter timeline script returns the special characters like é into \u00e9 so I stripped the backslash and used @rubbude his preg_replace.

// Fix uxxxx charcoding to html
$text = "De #Haarstichting is h\u00e9t medium voor alles Into:  De #Haarstichting is hét medium voor alles";
$str     = str_replace('\u','u',$text);
$str_replaced = preg_replace('/u([\da-fA-F]{4})/', '&#x\1;', $str);

echo $str_replaced;

It workes for me and it turns: De #Haarstichting is h\u00e9t medium voor alles Into: De #Haarstichting is hét medium voor alles

answered Sep 18 '22 12:09

Theo

Related questions
                            
                                English mnemonics to Vim's shortcuts
                            
                                Python class decorator arguments
                            
                                Transparent PNG in PIL turns out not to be transparent
                            
                                Resharper 6 create auto property by default
                            
                                sizeof(long) in 64-bit C++
                            
                                is there an API for GIT (C++ or other languages)
                            
                                iOS: How to use images in custom bundle in Interface Builder?
                            
                                Access to file using Java with Samba JCIFS
                            
                                Using RequireJS with a Rails 3.1 app
                            
                                Intellij auto import for inner classes
                            
                                How to depict "class uses class" relationship via UML
                            
                                iOS - How to get raw offset for timezone?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to convert 'u00e9' into a utf8 char, in mysql or php?

Tags:

carpii

People also ask

3 Answers

Sergio-MA-Brazil

rabudde

Theo

Recent Activity

Donate For Us