I am attempting to reformat a hierarchical (xml) file to a "per line" file using vim.
Here is a simplified example. The real case is "large" (500k lines) and entries and groups are arbitrary counts.
input file:
<group key="abc">
<entry val="1"/>
<entry val="2"/>
<entry val="3"/>
</group>
<group key="xyz">
<entry val="1"/>
<entry val="2"/>
<entry val="3"/>
<entry val="4"/>
<entry val="5"/>
</group>
output result:
abc,1
abc,2
abc,3
xyz,1
xyz,2
xyz,3
xyz,4
xyz,5
Please note that I don't need a single magic expression that does all of this (although that'd be swell). The part I am struggling with is getting the key associated with each of the entries. I'm sure there is a good idiom for handling this. Thanks in advance.
One thing I tried and may be useful to others is as follows:
:g/key="\(.*\)"/.;/<\/group/s/<entry /\1,<entry /g
which does not work because the range match is not carried over to the substitution. This expression essentially looks for pat1, builds a range from there to pat2, then substitutes pat3 with pat4 (but only within instances of the pat1,pat2 range inclusive).
:g/pat1/.;/pat2/s/pat3/pat4/g
Solution
The best solution below solved it by looking for the entry and then backwards for the key, as opposed to what I was trying to do above by building a range and multiple substitutions. What finally worked required some minor modifications, so they are provided here for others. The commands that do the heavy lifting are:
:g/entry/?key?,\?t.
:g/entry/norm ddpkJ
:v/entry/d
Breakdown:
Search for all the entry lines:
:g/entry/
From there, search backwards for the the line that has the key and copy it below each entry.
?key?,\?t.
Search for all entry lines again, and switch to normal mode editing
:g/entry/norm
Swap the two lines (delete key line and paste it below the group line). Move up to the key line and join the two lines.
ddpkJ
Once all keys are mapped, search for any lines that do NOT have an entry and delete them.
:v/entry/d
If you have multiple hierarchies as I do, you can run the first two lines multiple times. Once everything is on a single line, it is fairly straightforward to clean it up into whatever final format is needed. Another major benefit is that this solution can be put in a script easily and rerun with
vim -S script.vim data.file
Following would work
:g/entry/?<group?,?<group?t.
:%norm J
:g/<\//d
:%norm df"f"df"i,<C-v><Esc>f"d$
Breakdown
For each line containing entry, search backwards for <group and copy to the line below entry
:g/entry/?<group?,?<group?t.
<group key="abc">
<entry val="1"/>
<group key="abc">
<entry val="2"/>
<group key="abc">
<entry val="3"/>
<group key="abc">
</group>
<group key="xyz">
<entry val="1"/>
<group key="xyz">
<entry val="2"/>
<group key="xyz">
<entry val="3"/>
<group key="xyz">
<entry val="4"/>
<group key="xyz">
<entry val="5"/>
<group key="xyz">
</group>
Join all lines
:%norm J
<group key="abc"> <entry val="1"/>
<group key="abc"> <entry val="2"/>
<group key="abc"> <entry val="3"/>
<group key="abc"> </group>
<group key="xyz"> <entry val="1"/>
<group key="xyz"> <entry val="2"/>
<group key="xyz"> <entry val="3"/>
<group key="xyz"> <entry val="4"/>
<group key="xyz"> <entry val="5"/>
<group key="xyz"> </group>
Remove the closing tags
:g/<\//d
<group key="abc"> <entry val="1"/>
<group key="abc"> <entry val="2"/>
<group key="abc"> <entry val="3"/>
<group key="xyz"> <entry val="1"/>
<group key="xyz"> <entry val="2"/>
<group key="xyz"> <entry val="3"/>
<group key="xyz"> <entry val="4"/>
<group key="xyz"> <entry val="5"/>
Fixup the remaining text by searching and deleting to and from quotes. Note that <C-v><Esc> is the key sequence to add an escape in your command.
:%norm df"f"df"i,<C-v><Esc>f"d$
abc,1
abc,2
abc,3
xyz,1
xyz,2
xyz,3
xyz,4
xyz,5
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With