Is there any way to print the first N words of a file? I've tried cut but it reads a document line-by-line. The only solution I came up with is:
sed ':a;N;$!ba;s/\n/δ/g' file | cut -d " " -f -20 | sed 's/δ/\n/g'
Essentially, replacing newlines with a character that doesn't not exist in the file, applying "cut" with space as delimiter and then restoring the newlines.
Is there any better solution?
The head command is used to display the first lines of a file.
Use the head command to write to standard output the first few lines of each of the specified files or of the standard input. If no flag is specified with the head command, the first 10 lines are displayed by default.
To look at the first few lines of a file, type head filename, where filename is the name of the file you want to look at, and then press <Enter>. By default, head shows you the first 10 lines of a file. You can change this by typing head -number filename, where number is the number of lines you want to see.
You could use awk
to print the first n words:
$ awk 'NR<=8{print;next}{exit}' RS='[[:blank:]]+|\n' file
This would print the first 8 words. Each word is output on a separate line, are you looking to keep the original format of the file?
Edit:
The following will preserve the original format of the file:
awk -v n=8 'n==c{exit}n-c>=NF{print;c+=NF;next}{for(i=1;i<=n-c;i++)printf "%s ",$i;print x;exit}' file
Demo:
$ cat file
one two
thre four five six
seven 8 9
10
$ awk -v n=8 'n==c{exit}n-c>=NF{print;c+=NF;next}{for(i=1;i<=n-c;i++)printf "%s ",$i;print x;exit}' file
one two
thre four five six
seven 8
A small caveat: if the last line printed doesn't use a single space as a separator this line will lose it's formatting.
$ cat file
one two
thre four five six
seven 8 9
10
# the 8th word fell on 3rd line: this line will be formatted with single spaces
$ awk -v n=8 'n==c{exit}n-c>=NF{print;c+=NF;next}{for(i=1;i<=n-c;i++)printf "%s ",$i;print x;exit}' file
one two
thre four five six
seven 8
Assuming words are non-white space separated by white space, you can use tr
to convert the document to one-word-per-line format and then count the first N lines:
tr -s ' \011' '\012' < file | head -n $N
where N=20
or whatever value you want for the number of words. Note that tr
is a pure filter; it only reads from standard input and only writes to standard output. The -s
option 'squeezes' out duplicate replacements, so you get one newline per sequence of blanks or tabs in the input. (If there is leading white space in the file, you get an initial blank line. There are various ways to deal with that, such as grab the first N+1 lines out output after all, or filter out all blank lines.)
Using GNU awk so we can set the RS to a regexp and access the matching string using RT:
$ cat file
the quick
brown fox jumped over
the
lazy
dog's back
$ gawk -v c=3 -v RS='[[:space:]]+' 'NR<=c{ORS=(NR<c?RT:"\n");print}' file
the quick
brown
$ gawk -v c=6 -v RS='[[:space:]]+' 'NR<=c{ORS=(NR<c?RT:"\n");print}' file
the quick
brown fox jumped over
$ gawk -v c=9 -v RS='[[:space:]]+' 'NR<=c{ORS=(NR<c?RT:"\n");print}' file
the quick
brown fox jumped over
the
lazy
dog's
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With