I'm trying to create a dictionary of words from a collection of files. Is there a simple way to print all the words in a file, one per line?

A good start is to simply use <code>sed</code> to replace all spaces with newlines, strip out the empty lines (again with <code>sed</code>), then <code>sort</code> with the <code>-u</code> (uniquify) flag to remove duplicates, as in this example: <pre class="prettyprint"><code>$ echo "the quick brown dog and fox jumped over the lazy dog" | sed 's/ /\n/g' | sed '/^$/d' | sort -u and brown dog fox jumped lazy over quick the </code></pre> Then you can start worrying about punctuation and the likes.

extract words from a file

2 Answers

You could use grep:

-E '\w+' searches for words
-o only prints the portion of the line that matches

% cat temp
Some examples use "The quick brown fox jumped over the lazy dog,"
rather than "Lorem ipsum dolor sit amet, consectetur adipiscing elit"
for example text.
# if you don't care whether words repeat
% grep -o -E '\w+' temp
Some
examples
use
The
quick
brown
fox
jumped
over
the
lazy
dog
rather
than
Lorem
ipsum
dolor
sit
amet
consectetur
adipiscing
elit
for
example
text

If you want to only print each word once, disregarding case, you can use sort

-u only prints each word once
-f tells sort to ignore case when comparing words

# if you only want each word once
% grep -o -E '\w+' temp | sort -u -f
adipiscing
amet
brown
consectetur
dog
dolor
elit
example
examples
for
fox
ipsum
jumped
lazy
Lorem
over
quick
rather
sit
Some
text
than
The
use

answered Oct 17 '22 04:10

rampion

A good start is to simply use sed to replace all spaces with newlines, strip out the empty lines (again with sed), then sort with the -u (uniquify) flag to remove duplicates, as in this example:

$ echo "the quick brown dog and fox jumped
over the lazy   dog" | sed 's/ /\n/g' | sed '/^$/d' | sort -u

and
brown
dog
fox
jumped
lazy
over
quick
the

Then you can start worrying about punctuation and the likes.

answered Oct 17 '22 05:10

paxdiablo

Related questions
                            
                                Change Default Group in Script
                            
                                How to find dos format files in a linux file system
                            
                                How to receive arguments via shell pipe in python?
                            
                                Using wget to recursively fetch a directory with --no-parent
                            
                                While executing shell scripts, how to know which line number it's executing,
                            
                                Executing Maven task from shell script and getting error codes
                            
                                What does mean $$ or $! in bash? [closed]
                            
                                Why does 'top | grep > file' not work?
                            
                                Hiding console output produced by os.system
                            
                                Re run previous command with different arguments
                            
                                find and replace string in a file
                            
                                Delete first line of file if it's empty
                            
                                Could not run adb reverse (React-Native)
                            
                                How to expand a CMD shell variable twice (recursively)
                            
                                What color options exist for ack(-grep) for colorization of output, logs, etc?
                            
                                How to run from PHP a bash script under root user
                            
                                Load unpacked Chrome extension programmatically
                            
                                Difference between test -h and test -L
                            
                                How do I run a sudo command in Emacs?
                            
                                Removing all special characters from a string in Bash

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

extract words from a file

Tags:

shell

unix

scripting

Andrew Prock

People also ask

2 Answers

rampion

paxdiablo

Recent Activity

Donate For Us