sep=";" statement breaks utf8 BOM in CSV file which is generated by XSL

Tags:

I'm currently developing CSV export with XSLT. And CSV file will be used %99 percent with Excel in my case, so I have to consider Excel behavior.

My first problem was German special characters in csv. Even fact that CSV encoding is UTF8, Excel cannot open properly CSV file with UTF8. The special characters are getting weird symbols. I found a solution for this problem. I just added 3 additional bytes(EF BB BF - a.k.a BOM Header) beginning of content bytes. Because UTF8 BOM is way to say that 'hey dude, it is UTF8, open it properly' to Excel. Problem solved!

And my second problem was about separator. The default separator could be comma or semicolon depending on region. I think it is semicolon in Germany and comma in UK. So, in order to prevent this problem, I had to add the line in below:

<xsl:text>sep=;</xsl:text>

<xsl:text>sep=,</xsl:text>

(This separator was not implemented as hard-coded)

But my problem which I cannot find any solution is that if you add "sep=;" or "sep=," beginning of the file while the CSV file is being generated with UT8-BOM, the BOM doesn't help for showing special characters properly anymore! And I'm sure that BOM bytes are always in the beginning of byte array. This screen shot is from MS Excel in Mac OS X:

enter image description here

First 3 symbols belong to BOM header.

Have you ever had like this problem or do you have any suggestions? Thank you.

Edit:

I share the printscreens.

a. With BOM and <xsl:text>sep=;</xsl:text>

enter image description here

b. Just with BOM

enter image description here

The Java code:

// Write the bytes ServletOutputStream out = resp.getOutputStream(); if(contentType.toString().equals("CSV")) {   // The additional bytes in below is prefix indicates that the content is in UTF-8.   out.write(239);   out.write(187);   out.write(191); }  out.write(bytes); // Content bytes, in this case XSL

The XSL code:

<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0"  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">  <xsl:output method="text" version="1.0" encoding="UTF-8" indent="yes" />      <xsl:template match="/">     <xsl:text>sep=;</xsl:text>     <table>         ...         </table> </xsl:template>

906

asked Dec 05 '13 09:12

Adem İlhan

2 Answers

You are right, there is no way in Excel 2007 to get it load both the encoding and the seperator correctly across different locales when someone double clicks a CSV file.

It seems like when you specify sep= after the BOM it forgets the BOM has told it that it is UTF-8.

You have to specify the BOM because in certain locales Excel does not detect the seperator. For instance in danish, the default seperator is ;. If you output tab or comma seperated text then it does not detect the seperator and in other locales if you seperate with semi-colon it doesn't load. You can test this by changing the locae format in windows settings - excel then picks this up.

From this question: Is it possible to force Excel recognize UTF-8 CSV files automatically?

and the answers it seems the only way is to use UTF16 le encoding with BOM.

Note also that as per http://wiki.scn.sap.com/wiki/display/ABAP/CSV+tests+of+encoding+and+column+separator?original_fqdn=wiki.sdn.sap.com it seems that if you use utf16-le with tab seperators then it works.

I've wondered if excel reads sep=; and then re-calls the method to get the CSV text and loses the BOM - I've tried giving incorrect text and I can't find any work around that tells excel to take both the sep and the encoding.

163

answered Oct 13 '22 02:10

Luke Page

This is the result of my testing with Excel 2013.

If you're stuck with UTF-8, there is a workaround which consists of BOM + data + sep=;

Input (written with UTF8 encoding)

\ufeffSome;Header;Columns Wîth;Fàncÿ;Stûff sep=;

Output

|Some|Header|Columns| |Wîth|Fàncÿ |Stûff  | |sep=|      |       |

The issue with solution is that while Excel interprets sep=; properly, it displays sep= (yes, it swallows the ;) in the first column of the last row.

However, if you can write the file as UTF16-LE, then there is an actual solution. Use the \t delimiter without specifying sep and Excel will play ball.

Input (written with UTF16-LE encoding)

\ufeffSome;Header;Columns Wîth;Fàncÿ;Stûff

Output

|Some|Header|Columns| |Wîth|Fàncÿ |Stûff  |

answered Oct 13 '22 02:10

Pier-Luc Gendreau

Related questions
                            
                                In VBA get rid of the case sensitivity when comparing words?
                            
                                Copy an R data.frame to an Excel spreadsheet
                            
                                Delete entire row if cell contains the string X
                            
                                Detect merged cells in VBA Excel with MergeArea
                            
                                Convert time fields to strings in Excel
                            
                                Using the value in a cell as a cell reference in a formula?
                            
                                Reverse order of For Each loop
                            
                                How to call python script on excel vba?
                            
                                Javascript to export html table to Excel
                            
                                Change the color of cells in one column when they don't match cells in another column
                            
                                How to break long string to multiple lines
                            
                                Json to excel using power query
                            
                                How do we use restful APIs from Excel macros (vba)?
                            
                                EPPlus - Read Excel Table
                            
                                Freezing columns in EPPlus (an Excel split function)
                            
                                Reading from Excel File using ClosedXML
                            
                                Insert line break in wrapped cell via code
                            
                                EXCEL VBA Check if entry is empty or not 'space'
                            
                                Fastest function to generate Excel column letters in C#
                            
                                How do I reference tables in Excel using VBA?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

sep=";" statement breaks utf8 BOM in CSV file which is generated by XSL

Tags:

csv

excel

xslt

Adem İlhan

People also ask

2 Answers

Luke Page

Pier-Luc Gendreau

Recent Activity

Donate For Us