Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reorder lines near the beginning of a huge text file (>20G)

Tags:

bash

vim

dd

I am a vim user and can use some basic awk or bash commands. Now I have a text (vcf) file with size more than 20G. What I wanted is to move the line #69 to below line#66:

$less huge.vcf
...
    66 ##contig=<ID=9,length=124595110>                                                                                                                                                       
    67 ##contig=<ID=X,length=171031299>                                                                                                                                                       
    68 ##contig=<ID=Y,length=91744698>                                                                                                                                                        
    69 ##contig=<ID=MT,length=16299>
...

What I wanted is:

...
    66 ##contig=<ID=9,length=124595110>     
    67 ##contig=<ID=MT,length=16299>                                                                                                                                                  
    68 ##contig=<ID=X,length=171031299>                                                                                                                                                       
    69 ##contig=<ID=Y,length=91744698>                                                                                                                                                        
...

I tried to open and edit it using vim (LargeFile plugin installed), but still not working very well.

like image 254
David Z Avatar asked Dec 05 '22 14:12

David Z


1 Answers

The easy approach is to copy the section you want to edit out of your file, modify it in-place, then copy it back in.

# extract the first hundred lines
head -n 100 huge.txt >start.txt

# modify that extracted subset
vim start.txt

# copy that section back into the beginning of larger file
dd if=start.txt of=huge.txt conv=notrunc

Note that this only works if your edits don't change the size of the section being modified. That is to say -- make sure that start.txt has the exact same size in bytes after being modified that it had before.

like image 53
Charles Duffy Avatar answered Jan 04 '23 23:01

Charles Duffy