Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Javascript -> Download CSV file encoded in ISO-8859-1 / Latin1 / Windows-1252

I have hacked together a small tool to extract shipping data from Amazon CSV order data. it works so far. here is a simple version as JS Bin: http://output.jsbin.com/jarako

For printing stamps/shipping labels, I need a file for uploading to Deutsche Post and to other parcel services. I used a small function saveTextAsFile which i found on stackoverflow. Everything good so far. No wrong displayed special characters (äöüß...) in the output textarea or downloaded files.

All these german post / parcel services sites accept only latin1 / iso-8859-1 encoded files for upload. But my downloaded file is always utf-8. If i upload it, all special characters (äöüß...) go wrong.

How can i change this? I still searched a lot. I have tried i.e.:

Setting the charset of the tool to iso-8859-1:

<META http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />

But the result is: Now I have wrong special characters still in the output textarea and in the downloaded file. If I upload it to the post site, I still get more wrong characters. Also if I check the encoding in CODA Editor it still says the downloaded file is UTF-8.

The saveTextAsFile function uses var textFileAsBlob = new Blob([textToWrite], {type:'text/plain'});. May be there is a ways to set the charset for download there!?

function saveTextAsFile()
{
    var textToWrite = $('#dataOutput').val();
    var textFileAsBlob = new Blob([textToWrite], {type:'text/plain'});
    var fileNameToSaveAs = "Brief.txt";

    var downloadLink = document.createElement("a");
    downloadLink.download = fileNameToSaveAs;
    downloadLink.innerHTML = "Download File";
    if (window.webkitURL != null)
    {
        // Chrome allows the link to be clicked
        // without actually adding it to the DOM.
        downloadLink.href = window.webkitURL.createObjectURL(textFileAsBlob);
    }
    else
    {
        // Firefox requires the link to be added to the DOM
        // before it can be clicked.
        downloadLink.href = window.URL.createObjectURL(textFileAsBlob);
        downloadLink.onclick = destroyClickedElement;
        downloadLink.style.display = "none";
        document.body.appendChild(downloadLink);
    }

    downloadLink.click();
}

Anyhow, there have to be a way to download files in other encoding as the site uses itself. The Amazon site, where i download the CSV file from is UTF-8 encoded. But downloaded CSV file from there is Latin1 (iso-8859-1) if i check it in CODA...

like image 381
Lutz Avatar asked Jul 27 '15 13:07

Lutz


People also ask

How can I tell if a file is ISO 8859 1?

If you find a byte with its high-order bit set, where the bytes both immediately before and immediately after it don't have their high-order bit set, you know it's ISO encoded (because bytes >127 always occur in sequences in UTF-8).


3 Answers

SCROLL DOWN TO THE UPDATE for the real solution!

Because I got no answer, I have searched more and more. It looks like there is NO SOLUTION in Javascript. Every test download I'v made, which was generated in javascript was UTF-8 encoded. Looks like Javascript is only made for UNICODE / UTF-8 or an other encoding would (possibly) only apply if the data would be transported again using a former HTTP transport. But for a Javascript, which runs on the client no additional HTTP transport happens, because the data is still on the client..

I have helped me now with building a small PHP Script on my server, to which i send the Data via GET or POST request. It converters the encoding to latin1 / ISO-8859-1 and downloads it as file. This is a ISO-8859-1 file with correctly encoded special characters, which I can upload to the mentioned postal and parcel service sites and everything looks good.

latin-download.php: (It is VERY IMPORTANT to save the PHP file itself also in ISO-8859-1, to make it work!!)

<?php
$decoded_a = urldecode($_REQUEST["a"]);
$converted_to_latin = mb_convert_encoding($decoded_a,'ISO-8859-1', 'UTF-8');
$filename = $_REQUEST["filename"];
header('Content-Disposition: attachment; filename="'.$filename.'"; content-type: text/plain; charset=iso-8859-1;');
echo $converted_to_latin;
?>

in my javascript code i use:

<a id="downloadlink">Download File</a>

<script>
var mydata = "this is testdata containing äöüß";

document.getElementById("downloadlink").addEventListener("click", function() {
    var mydataToSend = encodeURIComponent(mydata);
    window.open("latin-download.php?a=" + mydataToSend + "&filename=letter-max.csv");
}, false);
</script>

for bigger amounts of data you have to switch from GET to POST...

UPDATE 08-Feb-2016

A half year later now i have found a solution in PURE JAVASCRIPT. Using inexorabletash/text-encoding. This is a polyfill for Encoding Living Standard. The standard includes decoding of old encodings like latin1 ("windows-1252"), but it forbids encoding into these old encoding types. So if you use the browser implemented window.TextEncoder function it does offer only UTF encoding. BUT, the polyfill solution offers a legacy mode, which does ALLOW also encoding into old encodings like latin1.

i use it like that:

<!DOCTYPE html>
<script>
// 'Copy' browser build in TextEncoder function to TextEncoderOrg (because it can NOT encode windows-1252, but so you can still use it as TextEncoderOrg()  )
var TextEncoderOrg = window.TextEncoder;   
// ... and deactivate it, to make sure only the polyfill encoder script that follows will be used 
window.TextEncoder = null;  

</script>
<script src="lib/encoding-indexes.js"></script>  // needed to support encode to old encoding types
<script src="lib/encoding.js"></script>  // encording polyfill

<script>

function download (content, filename, contentType) {
    if(!contentType) contentType = 'application/octet-stream';
        var a = document.createElement('a');
        var blob = new Blob([content], {'type':contentType});
        a.href = window.URL.createObjectURL(blob);
        a.download = filename;
        a.click();
}

var text = "Es wird ein schöner Tag!";

// Do the encoding
var encoded = new TextEncoder("windows-1252",{ NONSTANDARD_allowLegacyEncoding: true }).encode(text);

// Download 2 files to see the difference
download(encoded,"windows-1252-encoded-text.txt");
download(text,"utf-8-original-text.txt");

</script>

The encoding-indexes.js file is about 500kb big, because it contains all the encoding tables. Because i need only windows-1252 encoding, for my use i have deleted the other encodings in this file. so now there are only 632 byte left.

like image 55
Lutz Avatar answered Oct 22 '22 13:10

Lutz


The problem is not the encoding but the fact that the special characters are displayed wrong in some applications, e.g. Microsoft Excel. UTF-8 is fine for displaying all special German characters. You can fix the problem by adding a Byte order mark (BOM) in front of the csv.

const BOM = "\uFEFF" 
let csvData = BOM + csvData
const blob = new Blob([csvData], { type: "text/csv;charset=utf-8" });

Solution based on this github post

like image 5
Jonathan Avatar answered Oct 22 '22 13:10

Jonathan


You cannot force a web server to send you data in a given encoding, only ask it politely. Your approach to just convert to the format you need is the right way to go.

If you wanted to avoid the PHP script, you may have luck specifying the encoding as a parameter when creating your Blob:

var textFileAsBlob = new Blob(textToWrite, {
  type: 'text/plain;charset=ISO-8859-1', 
  encoding: "ISO-8859-1"
});

See Specifying blob encoding in Google Chrome for more details.

like image 2
Jacob Avatar answered Oct 22 '22 14:10

Jacob