Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Read ansi file and convert to UTF-8 string

Tags:

file

php

ansi

Is there any way to do that with PHP?

The data to be inserted looks fine when I print it out.

But when I insert it in the database the field becomes empty.

like image 373
user192344 Avatar asked Jan 04 '11 15:01

user192344


People also ask

How do I change ANSI file to UTF-8?

3. Choose "UTF-8" from the drop-down box next to "Encoding" and click "Save." Your text file will be converted and saved in the UTF-8 format, although the file extension will remain the same. You can now able open and edit the document at any time and your special characters will be preserved.

How do I change ANSI TO UTF-8 in Notepad ++?

Try Settings -> Preferences -> New document -> Encoding -> choose UTF-8 without BOM, and check Apply to opened ANSI files . That way all the opened ANSI files will be treated as UTF-8 without BOM.

Is UTF-8 the same as ANSI?

ANSI and UTF-8 are both encoding formats. ANSI is the common one byte format used to encode Latin alphabet; whereas, UTF-8 is a Unicode format of variable length (from 1 to 4 bytes) which can encode all possible characters.


2 Answers

$tmp = iconv('YOUR CURRENT CHARSET', 'UTF-8', $string);

or

$tmp = utf8_encode($string);

Strange thing is you end up with an empty string in your DB. I can understand you'll end up with some garbarge in your DB but nothing at all (empty string) is strange.

I just typed this in my console:

iconv -l | grep -i ansi

It showed me:

ANSI_X3.4-1968
ANSI_X3.4-1986
ANSI_X3.4
ANSI_X3.110-1983
ANSI_X3.110
MS-ANSI

These are possible values for YOUR CURRENT CHARSET As pointed out before when your input string contains chars that are allowed in UTF, you dont need to convert anything.

Change UTF-8 in UTF-8//TRANSLIT when you dont want to omit chars but replace them with a look-a-like (when they are not in the UTF-8 set)

like image 55
Mark Bekkers Avatar answered Sep 27 '22 01:09

Mark Bekkers


"ANSI" is not really a charset. It's a short way of saying "whatever charset is the default in the computer that creates the data". So you have a double task:

  1. Find out what's the charset data is using.
  2. Use an appropriate function to convert into UTF-8.

For #2, I'm normally happy with iconv() but utf8_encode() can also do the job if source data happens to use ISO-8859-1.

Update

It looks like you don't know what charset your data is using. In some cases, you can figure it out if you know the country and language of the user (e.g., Spain/Spanish) through the default encoding used by Microsoft Windows in such territory.

like image 27
Álvaro González Avatar answered Sep 23 '22 01:09

Álvaro González