Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how do I extract specific paragraphs from a text using vim?

Tags:

text

vim

extract

I am trying to extract test from a huge file containing text in this format, multiple times

CL blahblahblah  
SP blahblahblah blahblahblah blahblahblah  
DE blahblahblahblahblahblah blahblahblah blahblahblah   
   blahblahblah blahblahblah blahblahblah blahblahblah  
AB blahblahblah blahblahblah blahblahblah 
   blahblahblahblahblahblah blahblahblah blahblahblah
   blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah    
   blahblahblah blahblahblah blahblahblah   
C1 blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah 
   blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah   
   blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah 
   lahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah 
RP blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah
   blahblahblah blahblahblah  
EM blahblahblah blahblahblah blahblahblah blahblahblah  
NR blahblahblah blahblahblah blahblahblah blahblahblah  
TC blahblahblah blahblahblah blahblahblah blahblahblah 
   blahblahblah blahblahblah blahblahblah blahblahblah  
Z9 blahblahblah blahblahblah blahblahblah blahblahblah  
PU blahblahblah blahblahblah blahblahblah blahblahblah  
PI blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah  

I am only interested in entries beginning with C1, AB, TI, but sometimes these are spanning multiple lines, and the XX tag lines that are following them are not always the same. Is there an easy way to keep only these entries? So my remaining text should be like this:

TI blahblahblah  
AB blahblahblah b lah blahblah blah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah     
C1 blahblahblah blahblahblah blahblahblah blahblahblah  
   blahblahblah blahblahblah blahblahblah blahblahblah  
   blahblahblah blahblahblah blahblahblah blahblahblah 
TI blah blah blah blah blah blah  
AB blahblahblah blahblahblah blahblahblah blahblahblahblahblahblah blahblahblah blahblahblah blahblahblah blahblahblahblah blahblahblah blahblahblah blahblahblah   
   blahblahblah blahblahblah blahblahblah blahblahblah  blahblahblah blahblahblah blahblahblah blahblahblah 
   blahblahblah blahblahblah blahblahblah blahblahblah 
C1 blahblahblah blahblahblah blahblahblah blahblahblahblahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah 

and so forth..

thanks very much!

like image 753
sfranky Avatar asked Dec 21 '22 02:12

sfranky


2 Answers

I would do:

:$put='X' | 1,$-1g/^\(\s\|C1\|AB\|TI\)\@!/   ,/^\S/-d
:$d

This will do the following:

  • Insert a line containing “X” at the end
  • for each line except the last one (1,$-1), if it starts with nonspace and does not start with C1, AB or TI (g/pattern/), delete (d) till next line not starting with space ,/pattern/ not included (- which is short for -1)
  • remove line “X” at the end

In order to try if you're using Gvim:

  • copy this code to your clipboard
  • in Gvim run :@+ (which plays Ex commands from the + register that is linked to the clipboard).

What I got:

AB blahblahblah blahblahblah blahblahblah 
   blahblahblahblahblahblah blahblahblah blahblahblah
   blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah    
   blahblahblah blahblahblah blahblahblah   
C1 blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah 
   blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah   
   blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah 
   lahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah 
like image 162
Benoit Avatar answered Mar 24 '23 03:03

Benoit


This should work:

:let @a="" | g/^\v<(C1|AB|TI)>/norm! "Ay/^\S^M

EDIT Windows-specific: you need to add a 'return' to that line, type ^M as C-qEnter (or C-v if you aren't using Windows or your vimrc doesn't set behave mswin)

Gets the lines into register "a. To replace the buffer with those lines:

:%d | put a

Or, put it into a new buffer:

:new | put a
like image 40
sehe Avatar answered Mar 24 '23 01:03

sehe