I have a text file I am (pretty sure) is encoded in UTF16, but I don't know how to load it in Julia. Do I have to load it as bytes and then convert with UTF16String
?
The simplest way is to read it as bytes and then convert:
s = open(filename, "r") do f
utf16(readbytes(f))
end
Note that utf16
also checks for a byte-order-mark (BOM), so it will deal with endianness issues and won't include the BOM in the resulting s
.
If you really want to avoid making a copy of the data, and you know it is native-endian, this is possible too, but you have to explicitly write a NUL terminator (since Julia UTF-16 string data internally has a NUL codepoint at the end for passing to C routines that expect NUL-terminated data):
s = open(filename, "r") do f
b = readbytes(f)
resize!(b, length(b)+2)
b[end] = b[end-1] = 0
UTF16String(reinterpret(UInt16, b))
end
However, typical UTF-16 text files will start with a BOM, and in this case the string s
will include the BOM as its first character, which may not be what you want.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With