I am trying to run the following to extract the text from all the pdfs
find *.pdf | awk '{system("pdftotext "$0)}'
but dang it some crazy person put spaces in file names, how can I deal with this smoothly?
What is awk's role in this? Perhaps you should let find execute things itself.
find . -name \*.pdf -exec /path/to/pdftotext {} \;
Or if you're really really stuck with assuming that filenames will be safe as stdout to find (which you've proven they are not simply by asking this question), then put the filenames in quotes. This will work:
find . -name \*.pdf -print | awk '{cmd=sprintf("pdftotext \"%s\"", $0);system(cmd);}'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With