Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading One Line from a File in UTF-16 Format

Tags:

java

utf-8

I have some files generated from a script that provide information about various computers. The txt files are in UTF-8, however, there is one line that is in UTF-16 format. How should I go about reading this line from the file?

P.S. I'm trying to write a program to parse out all of these files and recompile them into one collective .csv file.

I have tried reading the file with a bufferedReader and Scanner, however this one line is the only one I am having trouble with. Most of the code I have found online for reading UTF-16 is for the entire file, which is not completely in UTF-16.

//How the line looks when opened in Notepad.

S e r i a l N u m b e r     5 C G 8 X X X X X X

//How the line looks when opened in Notepad++ with "nul" values in between each character.

S e r i a l N u m b e r     

 5 C G 8 X X X X X X

My code can pick up parts of the string, but the format of it is on multiple lines and Java doesn't recognize the characters in between each letter or number.

like image 212
Ben Combs Avatar asked Jun 07 '19 15:06

Ben Combs


People also ask

What is the difference between UTF-8 and UTF-16?

These methods differ in the number of bytes they need to store a character. UTF-8 encodes a character into a binary string of one, two, three, or four bytes. UTF-16 encodes a Unicode character into a string of either two or four bytes.

Should I use UTF-8 or UTF-16?

UTF-16 is, obviously, more efficient for A) characters for which UTF-16 requires fewer bytes to encode than does UTF-8. UTF-8 is, obviously, more efficient for B) characters for which UTF-8 requires fewer bytes to encode than does UTF-16.

Is UTF-16 fixed?

UTF-16 isn't really fixed width; some Unicode code points are one 16-bit code unit, others require two 16-bit code units — just like UTF-8 isn't fixed width; some Unicode code points require one 8-bit code units, others require two, three or even four 8-bit code units (but not five or six, despite the comment from ...


1 Answers

You can try like this.

File infile = new File("/someFileInutf16.txt");
FileInputStream inputStream = new FileInputStream(infile);
 Reader in = new InputStreamReader(inputStream, "UTF-16");
like image 76
Sambit Avatar answered Oct 13 '22 00:10

Sambit