Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How split a file in words in unix command line?

I'm doing a faster tests for a naive boolean information retrival system, and I would like use awk, grep, egrep, sed or thing similiar and pipes for split a text file into words and save them into other file with a word per line. Example my file cotains:

Hola mundo, hablo español y no sé si escribí bien la pregunta, ojalá me puedan entender y ayudar Adiós. 

The output file should contain:

Hola mundo hablo español ... 

Thank!

like image 277
jaundavid Avatar asked Mar 19 '13 14:03

jaundavid


People also ask

How do you split a file in Unix?

If you use the -l (a lowercase L) option, replace linenumber with the number of lines you'd like in each of the smaller files (the default is 1,000). If you use the -b option, replace bytes with the number of bytes you'd like in each of the smaller files.

How do I split a text file in Linux?

To split a file into pieces, you simply use the split command. By default, the split command uses a very simple naming scheme. The file chunks will be named xaa, xab, xac, etc., and, presumably, if you break up a file that is sufficiently large, you might even get chunks named xza and xzz.

How do you split a file in Linux?

To split large files into small pieces, we use the split command in the Linux operating system. The split command is used to split or break large files into small pieces in the Linux system. By default, it generates output files of a fixed size, the default lines are 1000 and the default prefix would be 'x'.


1 Answers

Using tr:

tr -s '[[:punct:][:space:]]' '\n' < file 
like image 142
Guru Avatar answered Sep 19 '22 15:09

Guru