My files sometimes contain multibyte characters in file-path or text in the file.
And I can not read the correct strings from the file.
I know the best way is to avoid using multibyte characters...
Unfortunately system encoding of my PC is Shift-JIS, not UTF.
This is the default setting of Windows system in my country.
And the default encoding of Python is UTF-8.
This is examples of reading multibyte-charactres from dm-script and python. "shift-jis.txt" and "utf-8.txt" have the same strings but different encoding.
TagGroup txtTg(string filepath,number encoding)
{
string text
TagGroup TextInFileTg=NewTagList()
number fileID=OpenFileForReading(filepath)
while(ReadFileLine(fileID,encoding,text))
{
TextInFileTg.TagGroupInsertTagAsString(infinity(),text)
}
CloseFile(fileID)
return TextInFileTg
}
string dirpath="C:\\Users\\arksa\\Documents\\scripts"
TagGroup flistTg=GetFilesInDirectory(dirpath,1)
string ext="txt"
string fname,fpath,text
number fileID
TagGroup fTg=NewTagGroup()
number encoding
for(number i=0;i<flistTg.TagGroupCountTags();i++)
{
flistTg.TagGroupGetTagAsString("["+i+"]:Name",fname)
fpath=PathConcatenate(dirpath,fname)
if(fname.PathExtractExtension(0)==ext)
{
if(fname=="shift-jis.txt")
{
encoding=0
fTg.TagGroupSetTagAsTagGroup("DM can read "+fname,txtTg(fpath,encoding))
}
if(fname=="utf-8.txt")
{
for(number j=0;j<3;j++)
{
encoding=j
fTg.TagGroupSetTagAsTagGroup("DM cannot read "+fname+":"+"encoding="+encoding,txtTg(fpath,encoding))
}
}
}
}
fTg.TagGroupOpenBrowserWindow("text",0)

import DigitalMicrograph as DM
import os
dirpath=r'C:\Users\arksa\Documents\scripts'
files=['shift-jis.txt','utf-8.txt']
encodings=['shift-jis','utf-8']
tag=DM.NewTagGroup()
for i in range(len(files)):
path=os.path.join(dirpath,files[i])
ftag=DM.NewTagList()
with open(path,encoding=encodings[i]) as f:
for fline in f:
ftag.InsertTagAsString(-1,str(fline).encode('shift-jis').decode('utf-8','backslashreplace'))
#print(fline)
tag.SetTagAsTagGroup("writing strings in "+files[i]+" to tag group from python fails.",ftag)
tag.OpenBrowserWindow(True)

DM-script can read a shift-jis text file correctly, but not for an utf-8 file. Python can read both shift-jis and utf-8 files, but writing strings to TagGroup fails. If encoding of multibyte characters is the identical to Windows encoding, only dm-script can deal with strings correctly. I cannot find the function to write string to TagGroup with arbitrary encoding.
Not an answer, but I found something potential useful:
// ENCODE parameter values
SYSTEM_MULTIBYTE = 0x00000000
GATAN = 0x00000001
UNICODE = 0x00000002
ROMAN = 0x01000000
JAPANESE = 0x01000001
CHINESE_TRAD = 0x01000002
KOREAN = 0x01000003
ARABIC = 0x01000004
HEBREW = 0x01000005
GREEK = 0x01000006
CYRILLIC = 0x01000007
DEVANAGARI = 0x01000009
GURMUKHI = 0x0100000A
GUJARATI = 0x0100000B
THAI = 0x01000015
CHINESE_SIMP = 0x01000019
EASTEUROPE = 0x0100001D
TURKISH = 0x01000023
BALTIC = 0x01000100
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With