Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to shuffle lines in a file in Linux

Tags:

linux

bash

unix

I want to shuffle a large file with millions of lines of strings in Linux. I tried 'sort -R' But it is very slow (takes like 50 mins for a 16M big file). Is there a faster utility that I can use in the place of it?

like image 219
alpha_cod Avatar asked Feb 06 '13 10:02

alpha_cod


People also ask

How do you shuffle a line in a file in Linux?

The shuf command generates random permutations from input lines to standard output. If given a file or series of files it will shuffle the lines and write the result to standard output. It can also limit the number of results returned supporting selecting random lines from a file or data from a list.

How do I sort lines in Linux?

To sort lines of text files, we use the sort command in the Linux system. The sort command is used to prints the lines of its input or concatenation of all files listed in its argument list in sorted order. The operation of sorting is done based on one or more sort keys extracted from each line of input.

What is Shuf in Linux?

The shuf command in Linux writes a random permutation of the input lines to standard output. It pseudo randomizes an input in the same way as the cards are shuffled.


Video Answer


1 Answers

Use shuf instead of sort -R (man page).

The slowness of sort -R is probably due to it hashing every line. shuf just does a random permutation so it doesn't have that problem.

(This was suggested in a comment but for some reason not written as an answer by anyone)

like image 58
dshepherd Avatar answered Sep 22 '22 14:09

dshepherd