Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to use sed to extract lines in specified order?

Tags:

linux

bash

unix

sed

I have a file which is ~50,000 lines long , and I need to retrieve specific lines. I have tried the following command :

sed -n 'Np;Np;Np' inputFile.txt > outputFile.txt

( 'N' being the specific lines, I want to extract )

This works fine, but the command extracts the lines in ORDER (i.e. it RE-ORDERS my input) ex. if I try:

sed -n '200p;33p;40,000p' inputFile.txt > outputFile.txt

I get a text file with the lines ordered as: 33, 200, 40,000 (which doesn't work for my purpose). Is there a way to maintain the order in which lines appear in the command?

like image 473
JazFlo Avatar asked Oct 04 '16 10:10

JazFlo


People also ask

How do you use sed on a specific line?

Just add the line number before: sed '<line number>s/<search pattern>/<replacement string>/ . Note I use . bak after the -i flag. This will perform the change in file itself but also will create a file.

Which of the following command will extract lines 5 to 10 from the file?

The cut command supports a number of options for processing different record formats. For fixed width fields, the -c option is used. This command will extract characters 5 to 10 from each line. For delimiter separated fields, the -d option is used.


3 Answers

You have to hold on to line 33 until after you've seen line 200:

sed -n '33h; 200{p; g; p}; 40000p' file

See the manual for further explanation: https://www.gnu.org/software/sed/manual/html_node/Other-Commands.html

awk might be more readable:

awk '
    NR == 33    {line33 = $0} 
    NR == 200   {print; print line33} 
    NR == 40000 {print}
' file 

If you have an arbitrary number of lines to print in a specific order, you can generalize this:

awk -v line_order="11 3 5 1" '
    BEGIN {
        n = split(line_order, inorder)
        for (i=1; i<=n; i++) linenums[inorder[i]]
    }
    NR in linenums {cache[NR]=$0}
    END {for (i=1; i<=n; i++) print cache[inorder[i]]}
' file
like image 80
glenn jackman Avatar answered Oct 17 '22 17:10

glenn jackman


with perl, saves input lines in hash variable with line number as key

$ seq 12 20 | perl -nle '
@l = (5,2,3,1);
$a{$.} = $_ if( grep { $_ == $. } @l );
END { print $a{$_} foreach @l } '
16
13
14
12
  • $. is line number and grep { $_ == $. } @l checks if that line number is present in the array @l which contains desired lines in order required


as a one-liner, @l declaration inside BEGIN to avoid initialization every iteration and also ensuring no blank lines if line number is out of range:

$ seq 50000 > inputFile.txt
$ perl -nle 'BEGIN{@l=(200,33,40000)} $a{$.}=$_ if(grep {$_ == $.} @l); END { $a{$_} and print $a{$_} foreach (@l) }' inputFile.txt > outputFile.txt
$ cat outputFile.txt
200
33
40000

For small enough input, can save the lines in an array and print indexes required. Note the adjustment made as index starts with 0

$ seq 50000 | perl -e '$l[0]=0; push @l,<>; print @l[200,33,40000]'
200
33
40000


Solution with head and tail combo:

$ for i in 200 33 40000; do head -"${i}" inputFile.txt | tail -1 ; done
200
33
40000


Performance comparison for input file seq 50000 > inputFile.txt

$ time perl -nle 'BEGIN{@l=(200,33,40000)} $a{$.}=$_ if(grep {$_ == $.} @l); END { $a{$_} and print $a{$_} foreach (@l) }' inputFile.txt > outputFile.txt

real    0m0.044s
user    0m0.036s
sys 0m0.000s

$ time awk -v line_order="200 33 40000" '
    BEGIN {
        n = split(line_order, inorder)
        for (i=1; i<=n; i++) linenums[inorder[i]]
    }
    NR in linenums {cache[NR]=$0}
    END {for (i=1; i<=n; i++) print cache[inorder[i]]}
' inputFile.txt > outputFile.txt

real    0m0.019s
user    0m0.016s
sys 0m0.000s

$ time for i in 200 33 40000; do sed -n "${i}{p;q}" inputFile.txt ; done > outputFile.txt

real    0m0.011s
user    0m0.004s
sys 0m0.000s

$ time sed -n '33h; 200{p; g; p}; 40000p' inputFile.txt > outputFile.txt

real    0m0.009s
user    0m0.008s
sys 0m0.000s

$ time for i in 200 33 40000; do head -"${i}" inputFile.txt | tail -1 ; done > outputFile.txt

real    0m0.007s
user    0m0.000s
sys 0m0.000s
like image 28
Sundeep Avatar answered Oct 17 '22 18:10

Sundeep


Can you use also other bash commands? In that case this works:

for i in 200 33 40000; do 
    sed -n "${i}p" inputFile.txt
done > outputFile.txt

Probably this is slower than using array within sed, but it is more practical.

like image 41
Riccardo Petraglia Avatar answered Oct 17 '22 17:10

Riccardo Petraglia