The gist of my question is this:
How can I display Unicode characters in Matlab's GUI (OS X) so that they are properly rendered?
Details:
I have a table of strings stored in a file, and some of these strings contain UTF-8-encoded Unicode characters. I have tried many different ways (too many to list here) to display the contents of this file in the MATLAB GUI, without success. For example:
>> fid = fopen('/Users/kj/mytable.txt', 'r', 'n', 'UTF-8'); >> [x, x, x, enc] = fopen(fid); enc enc = UTF-8 >> tbl = textscan(fid, '%s', 35, 'delimiter', ','); >> tbl{1}{1} ans = ÎÎÎÎÎΠΣΦΩαβγδεζηθικλμνξÏÏÏÏÏÏÏÏÏÏ >>
As it happens, if I paste the string directly into the MATLAB GUI, the pasted string is displayed properly, which shows that the GUI is not fundamentally incapable of displaying these characters, but once MATLAB reads it in, it longer displays it correctly. For example:
>> pasted = 'ΓΔΘΛΞΠΣΦΩαβγδεζηθικλμνξπρςστυφχψω' pasted = >>
Thanks!
Open the file in Notepad. Click 'Save As...'. In the 'Encoding:' combo box you will see the current file format.
Read File Contents into Array Create a sample text file that contains integers and floating-point numbers. x = 1:1:5; y = [x;rand(1,5)]; fileID = fopen('nums2. txt','w'); fprintf(fileID,'%d %4.4f\n',y); fclose(fileID); View the contents of the file.
UTF-8 is not a character set but an encoding used with Unicode. It happens to be compatible with ASCII too, because the codes used for multiple byte encodings lie in the part of the ASCII character set that is unused.
I present below my findings after doing some digging... Consider these test files:
ΓΔΘΛΞΠΣΦΩαβγδεζηθικλμνξπρςστυφχψω
தமிழ்
First, we read files:
%# open file in binary mode, and read a list of bytes fid = fopen('a.txt', 'rb'); b = fread(fid, '*uint8')'; %'# read bytes fclose(fid); %# decode as unicode string str = native2unicode(b,'UTF-8');
If you try to print the string, you get a bunch of nonsense:
>> str str =
Nonetheless, str
does hold the correct string. We can check the Unicode code of each character, which are as you can see outside the ASCII range (last two are the non-printable CR-LF line endings):
>> double(str) ans = Columns 1 through 13 915 916 920 923 926 928 931 934 937 945 946 947 948 Columns 14 through 26 949 950 951 952 953 954 955 956 957 958 960 961 962 Columns 27 through 35 963 964 965 966 967 968 969 13 10
Unfortunately, MATLAB seems unable to display this Unicode string in a GUI on its own. For example, all these fail:
figure text(0.1, 0.5, str, 'FontName','Arial Unicode MS') title(str) xlabel(str)
One trick I found is to use the embedded Java capability:
%# Java Swing label = javax.swing.JLabel(); label.setFont( java.awt.Font('Arial Unicode MS',java.awt.Font.PLAIN, 30) ); label.setText(str); f = javax.swing.JFrame('frame'); f.getContentPane().add(label); f.pack(); f.setVisible(true);
As I was preparing to write the above, I found an alternative solution. We can use the DefaultCharacterSet
undocumented feature and set the charset to UTF-8
(on my machine, it is ISO-8859-1
by default):
feature('DefaultCharacterSet','UTF-8');
Now with a proper font (you can change the font used in the Command Window from Preferences > Font
), we can print the string in the prompt (note that DISP is still incapable of printing Unicode):
>> str str = ΓΔΘΛΞΠΣΦΩαβγδεζηθικλμνξπρςστυφχψω >> disp(str) ΓΔΘΛΞΠΣΦΩαβγδεζηθικλμνξπÏςστυφχψω
And to display it in a GUI, UICONTROL should work (under the hood, I think it is really a Java Swing component):
uicontrol('Style','text', 'String',str, ... 'Units','normalized', 'Position',[0 0 1 1], ... 'FontName','Arial Unicode MS', 'FontSize',30)
Unfortunately, TEXT, TITLE, XLABEL, etc.. are still showing garbage:
As a side note: It is difficult to work with m-file sources containing Unicode characters in the MATLAB editor. I was using Notepad++, with files encoded as UTF-8 without BOM.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With