Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing repeated pairs from a very big text file

Tags:

bash

awk

perl

I have a very big text file (few GB) that has the following format:

1 2
3 4
3 5
3 6
3 7
3 8
3 9

File is already sorted and double lines were removed. There are repeated pairs like '2 1', '4 3' reverse order that I want to remove. Does anybody have any solution to do it in a very resource limited environments, in BASH, AWK, perl or any similar languages? I can not load the whole file and loop between the values.

like image 458
platoali Avatar asked Dec 16 '22 18:12

platoali


1 Answers

You want to remove lines where the second number is less than the first?

perl -i~ -lane'print if $F[0] < $F[1]' file
like image 75
ikegami Avatar answered Dec 18 '22 06:12

ikegami