I'm doing a faster tests for a naive boolean information retrival system, and I would like use awk, grep, egrep, sed or thing similiar and pipes for split a text file into words and save them into other file with a word per line. Example my file cotains:
Hola mundo, hablo español y no sé si escribí bien la pregunta, ojalá me puedan entender y ayudar Adiós.
The output file should contain:
Hola mundo hablo español ...
Thank!
If you use the -l (a lowercase L) option, replace linenumber with the number of lines you'd like in each of the smaller files (the default is 1,000). If you use the -b option, replace bytes with the number of bytes you'd like in each of the smaller files.
To split a file into pieces, you simply use the split command. By default, the split command uses a very simple naming scheme. The file chunks will be named xaa, xab, xac, etc., and, presumably, if you break up a file that is sufficiently large, you might even get chunks named xza and xzz.
To split large files into small pieces, we use the split command in the Linux operating system. The split command is used to split or break large files into small pieces in the Linux system. By default, it generates output files of a fixed size, the default lines are 1000 and the default prefix would be 'x'.
Using tr:
tr -s '[[:punct:][:space:]]' '\n' < file
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With