Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Check line for unprintable characters while reading text file

Tags:

java

file

file-io

My program must read text files - line by line. Files in UTF-8. I am not sure that files are correct - can contain unprintable characters. Is possible check for it without going to byte level? Thanks.

like image 353
user710818 Avatar asked Sep 14 '11 09:09

user710818


People also ask

How do I find non printable characters in a text file?

Option #1 - Show All Characters Then, go to the menu and select View->Show Symbol->Show All Characters . All characters will become visible, but you will have to scroll through the whole file to see which character needs to be removed.

How do I view special characters in a text file?

Go to View Menu > Select Show Symbol > Select Show All Characters . It displays all hidden characters in the opened file.

How can we view non printable characters in a file in Unix?

[3] On BSD, pipe the ls -q output through cat -v or od -c ( 25.7 ) to see what the non-printing characters are. This shows that the non-printing characters have octal values 13 and 14, respectively. If you look up these values in an ASCII table ( 51.3 ) , you will see that they correspond to CTRL-k and CTRL-l.


2 Answers

Open the file with a FileInputStream, then use an InputStreamReader with the UTF-8 Charset to read characters from the stream, and use a BufferedReader to read lines, e.g. via BufferedReader#readLine, which will give you a string. Once you have the string, you can check for characters that aren't what you consider to be printable.

E.g. (without error checking), using try-with-resources (which is in vaguely modern Java version):

String line; try (     InputStream fis = new FileInputStream("the_file_name");     InputStreamReader isr = new InputStreamReader(fis, Charset.forName("UTF-8"));     BufferedReader br = new BufferedReader(isr); ) {     while ((line = br.readLine()) != null) {         // Deal with the line     } } 
like image 52
T.J. Crowder Avatar answered Oct 14 '22 22:10

T.J. Crowder


While it's not hard to do this manually using BufferedReader and InputStreamReader, I'd use Guava:

List<String> lines = Files.readLines(file, Charsets.UTF_8); 

You can then do whatever you like with those lines.

EDIT: Note that this will read the whole file into memory in one go. In most cases that's actually fine - and it's certainly simpler than reading it line by line, processing each line as you read it. If it's an enormous file, you may need to do it that way as per T.J. Crowder's answer.

like image 21
Jon Skeet Avatar answered Oct 14 '22 21:10

Jon Skeet