I have a directory structure full of MS word files and I have to search the directory for particular string. Until now I was using the following command to search files for in a directory
find . -exec grep -li 'search_string' {} \;
find . -name '*' -print | xargs grep 'search_string'
But, this search doesn't work for MS word files.
Is it possible to do string search in MS word files in Linux?
Grep is a Linux / Unix command-line tool used to search for a string of characters in a specified file. The text search pattern is called a regular expression. When it finds a match, it prints the line with the result. The grep command is handy when searching through large log files.
A simple way to work this out is by using grep pattern searching tool, is a powerful, efficient, reliable and most popular command-line utility for finding patterns and words from files or directories on Unix-like systems.
To open the Find pane from the Edit View, press Ctrl+F, or click Home > Find. Find text by typing it in the Search the document for… box. Word Web App starts searching as soon as you start typing.
I'm a translator and know next to nothing about scripting but I was so pissed off about grep not being able to scan inside Word .doc files that I worked out how to make this little shell script to use catdoc and grep to search a directory of .doc files for a given input string.
You need to install catdoc
and docx2txt
packages
#!/bin/bash
echo -e "\n
Welcome to scandocs. This will search .doc AND .docx files in this directory for a given string. \n
Type in the text string you want to find... \n"
read response
find . -name "*.doc" |
while read i; do catdoc "$i" |
grep --color=auto -iH --label="$i" "$response"; done
find . -name "*.docx" |
while read i; do docx2txt < "$i" |
grep --color=auto -iH --label="$i" "$response"; done
All improvements and suggestions welcome!
Here's a way to use "unzip" to print the entire contents to standard output, then pipe to "grep -q" to detect whether the desired string is present in the output. It works for docx format files.
#!/bin/bash
PROG=`basename $0`
if [ $# -eq 0 ]
then
echo "Usage: $PROG string file.docx [file.docx...]"
exit 1
fi
findme="$1"
shift
for file in $@
do
unzip -p "$file" | grep -q "$findme"
[ $? -eq 0 ] && echo "$file"
done
Save the script as "inword" and search for "wombat" in three files with:
$ ./inword wombat file1.docx file2.docx file3.docx
file2.docx
Now you know file2.docx contains "wombat". You can get fancier by adding support for other grep options. Have fun.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With