I want to strip the HTML out of few hundred files.
Here's the command I've started with:
find -name *.html -exec w3m {} > w3m {}.html.out \;
The problem I've run into is that it created one single large .htm.out file (named {}.html.out) -- I want the file I'm using to be named whatever it's original is .out.
For instance, I have
2002/filename.html
I want to run it through w3m, and get 2002/filename.html.out
Any suggestions? I'm open to other solutions that don't use bash
I'm using cygwin.
Using Bash, there's also ${file%. *} to get the filename without the extension and ${file##*.} to get the extension alone. That is, file="thisfile.
You need to utilize the “-L” option and the path and “-name” option in your command. The “*” in the name specification is used for searching “all” the bash files with “. sh” extensions. It returns a total of 4 records on our screen.
In Linux, the basename command prints the last element of a file path. This is especially useful in bash scripts where the file name needs to be extracted from a long file line. The “basename” takes a filename and prints the filename's last portion. It can also delete any following suffix if needed.
You can use the find command to search for a file or directory on your file system. By using the -exec flag ( find -exec ), matches, which can be files, directories, symbolic links, system devices, etc., can be found and immediately processed within the same command.
The redirection happens outside of find
. Invoke a subshell.
find -name *.html -exec bash -c 'w3m "$1" > w3m-"$1".html.out' w3mout {} \;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With