I open Notepad (Windows) and write
Some lines with special characters
Special: Žđšćč
and go to Save As... "someFile.txt" with Encoding set to UTF-8.
In Java I have
FileInputStream fis = new FileInputStream(new File("someFile.txt"));
InputStreamReader isr = new InputStreamReader(fis, "UTF-8");
BufferedReader in = new BufferedReader(isr);
String line;
while((line = in.readLine()) != null) {
printLine(line);
}
in.close();
But I get question marks and similar "special" characters. Why?
EDIT: I have this input (one line in .txt file)
665,Žđšćč
and this code
FileInputStream fis = new FileInputStream(new File(fileName));
InputStreamReader isr = new InputStreamReader(fis, "UTF-8");
BufferedReader in = new BufferedReader(isr);
String line;
while((line = in.readLine()) != null) {
Toast.makeText(mContext, line, Toast.LENGTH_LONG).show();
Pattern p = Pattern.compile(",");
String[] article = p.split(line);
Toast.makeText(mContext, article[0], Toast.LENGTH_LONG).show();
Toast.makeText(mContext, Integer.parseInt(article[0]), Toast.LENGTH_LONG).show();
}
in.close();
And Toast output (for ones who aren't familiar with Android, Toast is just a method to show a pop-up on screen with particular text in it) is fine. Console shows "weird characters" (probably because of encoding in console window). But it fails at parsing an integer because console says this (warning: toast output is just fine
) - Problem?
It seems like the String is containing some "weird" characters which Toast can't show/render but when I try to parse it, it crashes. Suggestions?
If I put ANSI in NotePad it works (integer parsing) and there are no weird chars as in the picture above, but of course my special characters aren't working.
Go to View Menu > Select Show Symbol > Select Show All Characters . It displays all hidden characters in the opened file.
txt) file is saved in an appropriate Unicode formatting. Saving a plain text document file as Unicode will allow you to use the text across multiple platforms and systems with minimal formatting changes.
"Unicode"-encoded Microsoft Windows text files contain text in UTF-16 Unicode Transformation Format. Such files normally begin with Byte Order Mark (BOM), which communicates the endianness of the file content.
ASCII files are plain text files. They can have extensions like . txt or have no extension at all. BINARY files are programs or other non-text files saved in the file format of the application that created them or archived or compressed file formats.
Notepad does not save special symbols correctly. I had a similar problem and I used Notepad++ instead and selected UTf-8 encoding from there. When I did this, my program no longer crashed when applying String library methods to it unlike when I created the text file in Notepad.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With