Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MATLAB: how to display UTF-8-encoded text read from file?

Tags:

The gist of my question is this:

How can I display Unicode characters in Matlab's GUI (OS X) so that they are properly rendered?

Details:

I have a table of strings stored in a file, and some of these strings contain UTF-8-encoded Unicode characters. I have tried many different ways (too many to list here) to display the contents of this file in the MATLAB GUI, without success. For example:

>> fid = fopen('/Users/kj/mytable.txt', 'r', 'n', 'UTF-8'); >> [x, x, x, enc] = fopen(fid); enc  enc =  UTF-8  >> tbl = textscan(fid, '%s', 35, 'delimiter', ','); >> tbl{1}{1}  ans =  ÎÎÎÎÎΠΣΦΩαβγδεζηθικλμνξÏÏÏÏÏÏÏÏÏÏ >>  

As it happens, if I paste the string directly into the MATLAB GUI, the pasted string is displayed properly, which shows that the GUI is not fundamentally incapable of displaying these characters, but once MATLAB reads it in, it longer displays it correctly. For example:

>> pasted = 'ΓΔΘΛΞΠΣΦΩαβγδεζηθικλμνξπρςστυφχψω'  pasted =   >>  

Thanks!

like image 471
kjo Avatar asked Jul 28 '11 17:07

kjo


People also ask

How do I view a UTF-8 file?

Open the file in Notepad. Click 'Save As...'. In the 'Encoding:' combo box you will see the current file format.

How do you read a text file number in Matlab?

Read File Contents into Array Create a sample text file that contains integers and floating-point numbers. x = 1:1:5; y = [x;rand(1,5)]; fileID = fopen('nums2. txt','w'); fprintf(fileID,'%d %4.4f\n',y); fclose(fileID); View the contents of the file.

Can UTF-8 be read as ASCII?

UTF-8 is not a character set but an encoding used with Unicode. It happens to be compatible with ASCII too, because the codes used for multiple byte encodings lie in the part of the ASCII character set that is unused.


1 Answers

I present below my findings after doing some digging... Consider these test files:

a.txt

ΓΔΘΛΞΠΣΦΩαβγδεζηθικλμνξπρςστυφχψω 

b.txt

தமிழ் 

First, we read files:

%# open file in binary mode, and read a list of bytes fid = fopen('a.txt', 'rb'); b = fread(fid, '*uint8')';             %'# read bytes fclose(fid);  %# decode as unicode string str = native2unicode(b,'UTF-8'); 

If you try to print the string, you get a bunch of nonsense:

>> str str = 

Nonetheless, str does hold the correct string. We can check the Unicode code of each character, which are as you can see outside the ASCII range (last two are the non-printable CR-LF line endings):

>> double(str) ans =   Columns 1 through 13    915   916   920   923   926   928   931   934   937   945   946   947   948   Columns 14 through 26    949   950   951   952   953   954   955   956   957   958   960   961   962   Columns 27 through 35    963   964   965   966   967   968   969    13    10 

Unfortunately, MATLAB seems unable to display this Unicode string in a GUI on its own. For example, all these fail:

figure text(0.1, 0.5, str, 'FontName','Arial Unicode MS') title(str) xlabel(str) 

One trick I found is to use the embedded Java capability:

%# Java Swing label = javax.swing.JLabel(); label.setFont( java.awt.Font('Arial Unicode MS',java.awt.Font.PLAIN, 30) ); label.setText(str); f = javax.swing.JFrame('frame'); f.getContentPane().add(label); f.pack(); f.setVisible(true); 

enter image description here


As I was preparing to write the above, I found an alternative solution. We can use the DefaultCharacterSet undocumented feature and set the charset to UTF-8 (on my machine, it is ISO-8859-1 by default):

feature('DefaultCharacterSet','UTF-8'); 

Now with a proper font (you can change the font used in the Command Window from Preferences > Font), we can print the string in the prompt (note that DISP is still incapable of printing Unicode):

>> str str = ΓΔΘΛΞΠΣΦΩαβγδεζηθικλμνξπρςστυφχψω  >> disp(str) ΓΔΘΛΞΠΣΦΩαβγδεζηθικλμνξπÏςστυφχψω 

And to display it in a GUI, UICONTROL should work (under the hood, I think it is really a Java Swing component):

uicontrol('Style','text', 'String',str, ...     'Units','normalized', 'Position',[0 0 1 1], ...     'FontName','Arial Unicode MS', 'FontSize',30) 

enter image description here

Unfortunately, TEXT, TITLE, XLABEL, etc.. are still showing garbage:

enter image description here


As a side note: It is difficult to work with m-file sources containing Unicode characters in the MATLAB editor. I was using Notepad++, with files encoded as UTF-8 without BOM.

like image 146
Amro Avatar answered Oct 08 '22 09:10

Amro