Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to HTML encode or transliterate "high" characters in Excel?

In Excel, how can I convert the contents of a cell which includes accented characters, curly quotes etc into either HTML for the same characters, OR a transliterated plaintext version?

We have an XLS document which contains some "high" characters. The data has been pulled in via a DB connection, and it appears that Excel is correctly handling individual cells (or rows) being in different codepages.

When we export this data to a CSV, some high characters are not correctly rendered - it appears that Excel uses a single encoding for the document (of course), and the bit value of the characters from their original codepage (which may or may not be consistent with other values in the same document).

As Excel renders the text correctly before export, I believe we should be able to encode the high characters to their HTML equivalents at this point, then export to CSV, thus ensuring that the CSV is ASCII-only.

(Alternatively we could transliterate down to plain ASCII, but that seems like a poor approach and probably no easier ...)

like image 316
Chris Burgess Avatar asked Aug 11 '11 02:08

Chris Burgess


1 Answers

There is a function by pgc01 that seems to do the trick here: http://www.mrexcel.com/forum/showpost.php?p=2091183&postcount=7

Hopefully it's ok for me to quote their code:

Function CodeUni(s As String, Optional bHex As Boolean = True)
    If bHex Then
        CodeUni = Right("0000" & Hex(AscW(Left(s, 1))), 4)
    Else
        CodeUni = AscW(Left(s, 1))
    End If
End Function

In case you're not sure how to get that into your Excel workbook, this guide is pretty useful: http://office.microsoft.com/en-us/excel-help/create-custom-functions-in-excel-2007-HA010218996.aspx

To summarise:

  1. Alt+F11 to bring up VBA editor
  2. Insert > Module
  3. Paste above code in
  4. Use function in your worksheet!

To get it as a proper HTML encoded unicode entity, I used:

="&#"&CodeUni(C1, TRUE)&";"

In my test case, I had ﻼ in C1 and in E1 the formula displays as &#FEFC;

like image 167
Rikki Avatar answered Sep 22 '22 18:09

Rikki