I'm doing a faster tests for a naive boolean information retrival system, and I would like use awk, grep, egrep, sed or thing similiar and pipes for split a text file into words and save them into other file with a word per line. Example my file cotains: <pre class="prettyprint"><code>Hola mundo, hablo español y no sé si escribí bien la pregunta, ojalá me puedan entender y ayudar Adiós. </code></pre> The output file should contain: <pre class="prettyprint"><code>Hola mundo hablo español ... </code></pre> Thank!

Using tr: <pre class="prettyprint"><code>tr -s '[[:punct:][:space:]]' '\n' < file </code></pre>

How split a file in words in unix command line?

Tags:

I'm doing a faster tests for a naive boolean information retrival system, and I would like use awk, grep, egrep, sed or thing similiar and pipes for split a text file into words and save them into other file with a word per line. Example my file cotains:

Hola mundo, hablo español y no sé si escribí bien la pregunta, ojalá me puedan entender y ayudar Adiós.

The output file should contain:

Hola mundo hablo español ...

Thank!

277

asked Mar 19 '13 14:03

jaundavid

1 Answers

Using tr:

tr -s '[[:punct:][:space:]]' '\n' < file

142

answered Sep 19 '22 15:09

Guru

Related questions
                            
                                Add user to group but not reflected when run "id"
                            
                                Error of java path on loading rJava package
                            
                                Variables as commands in Bash scripts
                            
                                Shell script templates [closed]
                            
                                Get the characters after the last index of a substring from a string
                            
                                Process size on UNIX
                            
                                Dependency Walker equivalent for Linux? [duplicate]
                            
                                gprof reports no time accumulated
                            
                                Is there a way to find the running time of the last executed command in the shell?
                            
                                What's the equivalent of Windows' QueryPerformanceCounter on OSX?
                            
                                Who executes first after fork(): parent or the child?
                            
                                Awk consider double quoted string as one token and ignore space in between
                            
                                unix- show the second line of the file
                            
                                How to make tail display only the lines that have a specific text?
                            
                                What does ** mean in a path?
                            
                                Random access to gzipped files?
                            
                                Surround all lines in a text file with quotes ('something')
                            
                                Create a daemon with double-fork in Ruby
                            
                                VIM textwidth has no effect
                            
                                Ctrl-R to search backwards for shell commands in csh

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How split a file in words in unix command line?

Tags:

unix

command-line

awk

tokenize

jaundavid

People also ask

1 Answers

Guru

Recent Activity

Donate For Us