Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Print first N words of a file

Is there any way to print the first N words of a file? I've tried cut but it reads a document line-by-line. The only solution I came up with is:

sed ':a;N;$!ba;s/\n/δ/g' file | cut -d " " -f -20 | sed 's/δ/\n/g'

Essentially, replacing newlines with a character that doesn't not exist in the file, applying "cut" with space as delimiter and then restoring the newlines.

Is there any better solution?

like image 837
Nick Avatar asked Mar 25 '13 10:03

Nick


People also ask

Which command is used to display the first n lines of a file?

The head command is used to display the first lines of a file.

Which command will print first 10 lines of file?

Use the head command to write to standard output the first few lines of each of the specified files or of the standard input. If no flag is specified with the head command, the first 10 lines are displayed by default.

How can you use head to print the first 5 lines of a file?

To look at the first few lines of a file, type head filename, where filename is the name of the file you want to look at, and then press <Enter>. By default, head shows you the first 10 lines of a file. You can change this by typing head -number filename, where number is the number of lines you want to see.


3 Answers

You could use awk to print the first n words:

$ awk 'NR<=8{print;next}{exit}' RS='[[:blank:]]+|\n' file

This would print the first 8 words. Each word is output on a separate line, are you looking to keep the original format of the file?

Edit:

The following will preserve the original format of the file:

awk -v n=8 'n==c{exit}n-c>=NF{print;c+=NF;next}{for(i=1;i<=n-c;i++)printf "%s ",$i;print x;exit}' file

Demo:

$ cat file
one two
thre four five six
seven 8 9 
10

$ awk -v n=8 'n==c{exit}n-c>=NF{print;c+=NF;next}{for(i=1;i<=n-c;i++)printf "%s ",$i;print x;exit}' file
one two
thre four five six
seven 8 

A small caveat: if the last line printed doesn't use a single space as a separator this line will lose it's formatting.

$ cat file 
one     two
thre     four five six
seven        8 9 
10

# the 8th word fell on 3rd line: this line will be formatted with single spaces
$ awk -v n=8 'n==c{exit}n-c>=NF{print;c+=NF;next}{for(i=1;i<=n-c;i++)printf "%s ",$i;print x;exit}' file
one     two
thre     four five six
seven 8
like image 101
Chris Seymour Avatar answered Nov 08 '22 21:11

Chris Seymour


Assuming words are non-white space separated by white space, you can use tr to convert the document to one-word-per-line format and then count the first N lines:

tr -s ' \011' '\012' < file | head -n $N

where N=20 or whatever value you want for the number of words. Note that tr is a pure filter; it only reads from standard input and only writes to standard output. The -s option 'squeezes' out duplicate replacements, so you get one newline per sequence of blanks or tabs in the input. (If there is leading white space in the file, you get an initial blank line. There are various ways to deal with that, such as grab the first N+1 lines out output after all, or filter out all blank lines.)

like image 40
Jonathan Leffler Avatar answered Nov 08 '22 23:11

Jonathan Leffler


Using GNU awk so we can set the RS to a regexp and access the matching string using RT:

$ cat file
the quick
brown     fox     jumped over
the
lazy
dog's back

$ gawk -v c=3 -v RS='[[:space:]]+' 'NR<=c{ORS=(NR<c?RT:"\n");print}' file
the quick
brown

$ gawk -v c=6 -v RS='[[:space:]]+' 'NR<=c{ORS=(NR<c?RT:"\n");print}' file
the quick
brown     fox     jumped over

$ gawk -v c=9 -v RS='[[:space:]]+' 'NR<=c{ORS=(NR<c?RT:"\n");print}' file
the quick
brown     fox     jumped over
the
lazy
dog's
like image 32
Ed Morton Avatar answered Nov 08 '22 23:11

Ed Morton