Is there any way to print the first N words of a file? I've tried cut but it reads a document line-by-line. The only solution I came up with is: <pre class="prettyprint"><code>sed ':a;N;$!ba;s/\n/δ/g' file | cut -d " " -f -20 | sed 's/δ/\n/g' </code></pre> Essentially, replacing newlines with a character that doesn't not exist in the file, applying "cut" with space as delimiter and then restoring the newlines. Is there any better solution?

You could use <code>awk</code> to print the first n words: <pre class="prettyprint lang-bsh prettyprint-override"><code>$ awk 'NR<=8{print;next}{exit}' RS='[[:blank:]]+|\n' file </code></pre> This would print the first 8 words. Each word is output on a separate line, are you looking to keep the original format of the file? Edit: The following will preserve the original format of the file: <pre class="prettyprint lang-bsh prettyprint-override"><code>awk -v n=8 'n==c{exit}n-c>=NF{print;c+=NF;next}{for(i=1;i<=n-c;i++)printf "%s ",$i;print x;exit}' file </code></pre> Demo: <pre class="prettyprint lang-bsh prettyprint-override"><code>$ cat file one two thre four five six seven 8 9 10 $ awk -v n=8 'n==c{exit}n-c>=NF{print;c+=NF;next}{for(i=1;i<=n-c;i++)printf "%s ",$i;print x;exit}' file one two thre four five six seven 8 </code></pre> A small caveat: if the last line printed doesn't use a single space as a separator this line will lose it's formatting. <pre class="prettyprint lang-bsh prettyprint-override"><code>$ cat file one two thre four five six seven 8 9 10 # the 8th word fell on 3rd line: this line will be formatted with single spaces $ awk -v n=8 'n==c{exit}n-c>=NF{print;c+=NF;next}{for(i=1;i<=n-c;i++)printf "%s ",$i;print x;exit}' file one two thre four five six seven 8 </code></pre>

Using GNU awk so we can set the RS to a regexp and access the matching string using RT: <pre class="prettyprint"><code>$ cat file the quick brown fox jumped over the lazy dog's back $ gawk -v c=3 -v RS='[[:space:]]+' 'NR<=c{ORS=(NR<c?RT:"\n");print}' file the quick brown $ gawk -v c=6 -v RS='[[:space:]]+' 'NR<=c{ORS=(NR<c?RT:"\n");print}' file the quick brown fox jumped over $ gawk -v c=9 -v RS='[[:space:]]+' 'NR<=c{ORS=(NR<c?RT:"\n");print}' file the quick brown fox jumped over the lazy dog's </code></pre>

Print first N words of a file

Tags:

linux

unix

scripting

awk

Is there any way to print the first N words of a file? I've tried cut but it reads a document line-by-line. The only solution I came up with is:

sed ':a;N;$!ba;s/\n/δ/g' file | cut -d " " -f -20 | sed 's/δ/\n/g'

Essentially, replacing newlines with a character that doesn't not exist in the file, applying "cut" with space as delimiter and then restoring the newlines.

Is there any better solution?

837

asked Mar 25 '13 10:03

Nick

3 Answers

You could use awk to print the first n words:

$ awk 'NR<=8{print;next}{exit}' RS='[[:blank:]]+|\n' file

This would print the first 8 words. Each word is output on a separate line, are you looking to keep the original format of the file?

Edit:

The following will preserve the original format of the file:

awk -v n=8 'n==c{exit}n-c>=NF{print;c+=NF;next}{for(i=1;i<=n-c;i++)printf "%s ",$i;print x;exit}' file

Demo:

$ cat file
one two
thre four five six
seven 8 9 
10

$ awk -v n=8 'n==c{exit}n-c>=NF{print;c+=NF;next}{for(i=1;i<=n-c;i++)printf "%s ",$i;print x;exit}' file
one two
thre four five six
seven 8

A small caveat: if the last line printed doesn't use a single space as a separator this line will lose it's formatting.

$ cat file 
one     two
thre     four five six
seven        8 9 
10

# the 8th word fell on 3rd line: this line will be formatted with single spaces
$ awk -v n=8 'n==c{exit}n-c>=NF{print;c+=NF;next}{for(i=1;i<=n-c;i++)printf "%s ",$i;print x;exit}' file
one     two
thre     four five six
seven 8

101

answered Nov 08 '22 21:11

Chris Seymour

Assuming words are non-white space separated by white space, you can use tr to convert the document to one-word-per-line format and then count the first N lines:

tr -s ' \011' '\012' < file | head -n $N

where N=20 or whatever value you want for the number of words. Note that tr is a pure filter; it only reads from standard input and only writes to standard output. The -s option 'squeezes' out duplicate replacements, so you get one newline per sequence of blanks or tabs in the input. (If there is leading white space in the file, you get an initial blank line. There are various ways to deal with that, such as grab the first N+1 lines out output after all, or filter out all blank lines.)

answered Nov 08 '22 23:11

Jonathan Leffler

Using GNU awk so we can set the RS to a regexp and access the matching string using RT:

$ cat file
the quick
brown     fox     jumped over
the
lazy
dog's back

$ gawk -v c=3 -v RS='[[:space:]]+' 'NR<=c{ORS=(NR<c?RT:"\n");print}' file
the quick
brown

$ gawk -v c=6 -v RS='[[:space:]]+' 'NR<=c{ORS=(NR<c?RT:"\n");print}' file
the quick
brown     fox     jumped over

$ gawk -v c=9 -v RS='[[:space:]]+' 'NR<=c{ORS=(NR<c?RT:"\n");print}' file
the quick
brown     fox     jumped over
the
lazy
dog's

answered Nov 08 '22 23:11

Ed Morton

Related questions
                            
                                How to captuare an IP packet, change its content and resend it on Linux?
                            
                                can not route packets from one interface to another [closed]
                            
                                Pyserial: could not configure port: (5, 'Input/output error)
                            
                                POSIX queues and msg_max
                            
                                Linking boost to shared library with CMake on Linux
                            
                                How can my process detect if the computer is shutting down?
                            
                                How to authenticate username/password using PAM w/o root privileges
                            
                                How does Linux support more than 512GB of virtual address range in x86-64?
                            
                                scull device driver in linux
                            
                                Write in an existing Excel .xls file which contains macros
                            
                                Is it possible to send a signal to process that belongs to different user?
                            
                                Android RAM page size?
                            
                                How to kill a process whose pid keeps changing?
                            
                                Infiniband addressing - host names to IB address without IBoIP
                            
                                ImportError: Cannot open shared object file in Python
                            
                                How to detect whether system is going to standby in Linux using C
                            
                                PHP show results while running
                            
                                what's the PYTHONPATH when there is no PYTHONPATH?
                            
                                Online Linux Bash Demo [closed]
                            
                                How to compile doom on ubuntu?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With