Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I limit (or truncate) text file by number of lines?

I would like to use a terminal/shell to truncate or otherwise limit a text file to a certain number of lines.

I have a whole directory of text files, for each of which only the first ~50k lines are useful.

How do I delete all lines over 50000?

like image 249
sjmurphy Avatar asked Sep 26 '13 01:09

sjmurphy


People also ask

How do I truncate a file in Linux?

To empty the file completely, use -s 0 in your command. Add a plus or minus sign in front of the number to increase or decrease the file by the given amount. If you don't have proper permissions on the file you're trying to truncate, you can usually just preface the command with sudo .

How do you count lines in a text file?

The command “wc” basically means “word count” and with different optional parameters one can use it to count the number of lines, words, and characters in a text file. Using wc with no options will get you the counts of bytes, lines, and words (-c, -l and -w option).

What is the use of file truncate size?

Definition and Usage The truncate() method resizes the file to the given number of bytes. If the size is not specified, the current position will be used.


1 Answers

In-place truncation

To truncate the file in-place with sed, you can do the following:

sed -i '50001,$ d' filename 
  • -i means in place.
  • d means delete.
  • 50001,$ means the lines from 50001 to the end.

You can make a backup of the file by adding an extension argument to -i, for example, .backup or .bak:

sed -i.backup '50001,$ d' filename 

In OS-X or FreeBSD you must provide an argument to -i - so to do this while avoiding making a backup:

sed -i '' '50001,$ d' filename 

The long argument name version is as follows, with and without the backup argument:

sed --in-place '50001,$ d' filename sed --in-place=.backup '50001,$ d' filename 

New File

To create a new truncated file, just redirect from head to the new file:

head -n50000 oldfilename > newfilename 
  • -n50000 means the number of lines, head otherwise defaults to 10.
  • > means to redirect into, overwriting anything else that might be there.
  • Substitute >> for > if you mean to append into the new file.

It is unfortunate that you cannot redirect into the same file, which is why sed is recommended for in-place truncation.

No sed? Try Python!

This is a bit more typing than sed. Sed is short for "Stream Editor" after all, and that's another reason to use it, it's what the tool is suited for.

This was tested on Linux and Windows with Python 3:

from collections import deque from itertools import islice  def truncate(filename, lines):     with open(filename, 'r+') as f:         blackhole = deque((),0).extend         file_iterator = iter(f.readline, '')         blackhole(islice(file_iterator, lines))         f.truncate(f.tell()) 

To explain the Python:

The blackhole works like /dev/null. It's a bound extend method on a deque with maxlen=0, which is the fastest way to exhaust an iterator in Python (that I'm aware of).

We can't simply loop over the file object because the tell method would be blocked, so we need the iter(f.readline, '') trick.

This function demonstrates the context manager, but it's a bit superfluous since Python would close the file on exiting the function. Usage is simply:

>>> truncate('filename', 50000) 
like image 179
Russia Must Remove Putin Avatar answered Sep 23 '22 04:09

Russia Must Remove Putin