How to count all the human readable files in Bash?

Tags:

I'm taking an intro course to UNIX and have a homework question that follows:

How many files in the previous question are text files? A text file is any file containing human-readable content. (TRICK QUESTION. Run the file command on a file to see whether the file is a text file or a binary data file! If you simply count the number of files with the .txt extension you will get no points for this question.)

The previous question simply asked how many regular files there were, which was easy to figure out by doing find . -type f | wc -l.

I'm just having trouble determining what "human readable content" is, since I'm assuming it means anything besides binary/assembly, but I thought that's what -type f displays. Maybe that's what the professor meant by saying "trick question"?

This question has a follow up later that also asks "What text files contain the string "csc" in any mix of upper and lower case?". Obviously "text" is referring to more than just .txt files, but I need to figure out the first question to determine this!

597

asked Sep 29 '12 15:09

Rekson

1 Answers

Quotes added for clarity:

Run the "file" command on a file to see whether the file is a text file or a binary data file!

The file command will inspect files and tell you what kind of file they appear to be. The word "text" will (almost) always be in the description for text files.

For example:

desktop.ini:   Little-endian UTF-16 Unicode text, with CRLF, CR line terminators
tw2-wasteland.jpg: JPEG image data, JFIF standard 1.02

So the first part is asking you to run the file command and parse its output.

I'm just having trouble determining what "human readable content" is, since I'm assuming it means anything besides binary/assembly, but I thought that's what -type f displays.

find -type f finds files. It filters out other filesystem objects like directories, symlinks, and sockets. It will match any type of file, though: binary files, text files, anything.

Maybe that's what the professor meant by saying "trick question"?

It sounds like he's just saying don't do find -name '*.txt' or some such command to find text files. Don't assume a particular file extension. File extensions have much less meaning in UNIX than they do in Windows. Lots of files don't even have file extensions!

I'm thinking the professor wants us to be able to run the file command on all files and count the number of ones with 'text' in it.

How about a multi-part answer? I'll give the straightforward solution in #1, which is probably what your professor is looking for. And if you are interested I'll explain its shortcomings and how you can improve upon it.

One way is to use xargs, if you've learned about that. xargs runs another command, using the data from stdin as that command's arguments.

$ find . -type f | xargs file
./netbeans-6.7.1.desktop: ASCII text
./VMWare.desktop:         a /usr/bin/env xdg-open script text executable
./VMWare:                 cannot open `./VMWare' (No such file or directory)
(copy).desktop:           cannot open `(copy).desktop' (No such file or directory)
./Eclipse.desktop:        a /usr/bin/env xdg-open script text executable

That works. Sort of. It'd be good enough for a homework assignment. But not good enough for a real world script.

Notice how it broke on the file VMWare (copy).desktop because it has a space in it. This is due to xargs's default behavior of splitting the arguments on whitespace. We can fix that by using xargs -0 to split command arguments on NUL characters instead of whitespace. File names can't contain NUL characters, so this will be able to handle anything.
```
$ find . -type f -print0 | xargs -0 file
./netbeans-6.7.1.desktop: ASCII text
./VMWare.desktop:         a /usr/bin/env xdg-open script text executable
./VMWare (copy).desktop:  a /usr/bin/env xdg-open script text executable
./Eclipse.desktop:        a /usr/bin/env xdg-open script text executable
```
This is good enough for a production script, and is something you'll encounter a lot. But I personally prefer an alternative syntax which doesn't require a pipe, and so is slightly more efficient.
```
$ find . -type f -exec file {} \;
./netbeans-6.7.1.desktop: ASCII text
./VMWare.desktop:         a /usr/bin/env xdg-open script text executable
./VMWare (copy).desktop:  a /usr/bin/env xdg-open script text executable
./Eclipse.desktop:        a /usr/bin/env xdg-open script text executable
```
To understand that, -exec calls file repeatedly, replacing {} with each file name it finds. The semi-colon \; marks the end of the file command.

169

answered Oct 03 '22 22:10

John Kugelman

Related questions
                            
                                How can I run a command only after some other commands have run successfully?
                            
                                Awk script to select files and print file sizes
                            
                                Trailing newlines and the bash 'read' builtin
                            
                                Make bash run a command right after it starts and then stay in this session?
                            
                                Perl color specifiers with redirected output
                            
                                Can I get a stdout of basic MySQL commands in BASH?
                            
                                Test if the current directory is inside a Rails project (bash)
                            
                                How do I make a plot in gnuplot with the lowest value automatically subtracted from the y data?
                            
                                sh.exe": emacs: command not found (Git bash Windows)
                            
                                find . -exec echo {} \; = missing argument to `-exec'
                            
                                Delaying wildcard expansion in bash, while quoting for special characters
                            
                                Idiomatic Analog to Ruby's `Object#tap` for Unix command Pipelines?
                            
                                Trying to remove non-printable characters (junk values) from a UNIX file
                            
                                "/bin/bash -l" Invalid option
                            
                                find files not in a list
                            
                                Why are "declare -f" and "declare -a" needed in bash scripts?
                            
                                Check the output of a command in shell script
                            
                                How to split a multi-line string containing the characters "\n" into an array of strings in bash? [duplicate]
                            
                                How to conditionally add flags to shell scripts?
                            
                                Running a Sqlite3 Script from Command Line

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to count all the human readable files in Bash?

Tags:

file

bash

unix

human-readable

Rekson

People also ask

1 Answers

John Kugelman

Recent Activity

Donate For Us