Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP: Converting UTF-8 string to Ansi?

Tags:

php

ansi

I build a csv string from values I have in my DB. The final string is stored in my $csv variable.

Now I offer this string for download, like this:

header("Content-type: text/csv");
header("Content-Disposition: attachment; filename=whatever.csv");
header("Pragma: no-cache");
header("Expires: 0");

echo $csv;

When I open this in Notepad++ for example, it says Ansi as UTF-8. How can I chnage that to Ansi only?

I tried:

$csv = iconv("ISO-8859-1", "WINDOWS-1252", $csv);

That did not change anything.

Thanks!

Solution: $csv = iconv("UTF-8", "WINDOWS-1252", $csv);

like image 316
user1856596 Avatar asked Feb 18 '13 10:02

user1856596


People also ask

How do I change ANSI in UTF-8?

Try Settings -> Preferences -> New document -> Encoding -> choose UTF-8 without BOM, and check Apply to opened ANSI files . That way all the opened ANSI files will be treated as UTF-8 without BOM. For explanation what's going on, read the comments below this answer.

Is UTF-8 the same as ANSI?

ANSI and UTF-8 are both encoding formats. ANSI is the common one byte format used to encode Latin alphabet; whereas, UTF-8 is a Unicode format of variable length (from 1 to 4 bytes) which can encode all possible characters.

Does PHP support UTF-8?

The utf8_encode() function is an inbuilt function in PHP which is used to encode an ISO-8859-1 string to UTF-8. Unicode has been developed to describe all possible characters of all languages and includes a lot of symbols with one unique number for each symbol/character.

Is UTF-8 A superset of ANSI?

ANSI is a superset of utf-8, and so there are no characters in this category.


3 Answers

Try:

$csv = iconv("UTF-8", "Windows-1252", $csv);

But you will eventually lose data because ANSI can only encode a small subset of UTF-8. If you don't have a very strong reason against it, serve your files UTF-8 encoded.

like image 78
Fabian Schmengler Avatar answered Oct 19 '22 03:10

Fabian Schmengler


Since there is a misunderstanding about ISO-8859-1, Windows-1252 & ANSI in your question an important thing to note here is that:

The so-called Windows character set (WinLatin1, or Windows code page 1252, to be exact) uses some of those positions for printable characters. Thus, the Windows character set is NOT identical with ISO 8859-1. The Windows character set is often called "ANSI character set", but this is SERIOUSLY MISLEADING. It has NOT been approved by ANSI.

Historical background: Microsoft based the design of the set on a draft for an ANSI standard. A glossary by Microsoft explicitly admits this.

Some more resources: here and here.

So just FYI for other people that end up in this question.

Here's MS's exact explanation on this:

The term “ANSI” as used to signify Windows code pages is a historical reference, but is nowadays a misnomer that continues to persist in the Windows community. The source of this comes from the fact that the Windows code page 1252 was originally based on an ANSI draft—which became International Organization for Standardization (ISO) Standard 8859-1. “ANSI applications” are usually a reference to non-Unicode or code page–based applications.

like image 5
Borislav Sabev Avatar answered Oct 19 '22 03:10

Borislav Sabev


To avoid data loss when converting special characters:

setlocale(LC_CTYPE, "fr_FR.UTF-8"); //set your own locale
$csv = iconv("UTF-8", "WINDOWS-1252//TRANSLIT//IGNORE", $csv);
like image 1
Zebx Avatar answered Oct 19 '22 04:10

Zebx