Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get a flat XML so that external entities are merged to the top level

I know this is a borderline case whether it really belongs to stackoverflow or superuser, but as it seems there are quite a few 'editing code' questions over here, I am posting it on SO.

I have a pile of XML files that someone in their infinite wisdom have decided to explode to a multiple files using the tags, which in result makes debugging/editing them a huge P-i-t-A. Therefore I am looking for:

  1. A way in VIM to open them in a single buffer (preferably so that the changes are saved in correct external entity files), OR;
  2. A way to expand the files in VIM so that the external entities are read and replaced in the buffer, OR;
  3. an easy bash/sed/python way of doing this on a command line (or in .vimrc)

The files included on top level might include new files and so on on who knows on how many levels so this needs to be recursive...

Here's a mockup sample on what the top level file looks like:

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE foobar PUBLIC "foobar:dtd" "foobar.dtd" [

        <!ENTITY foo SYSTEM "foo.xml">

        <!ENTITY bar SYSTEM "bar.xml">
]>
<foo>
        <params>
                &foo;
        </params>
        <bar>
                &bar;
        </bar>
</foo>

EDIT: The list is in order of preference - if no 1. or 2. solutions are available, the bounty goes for the best #3...

EDIT 2: Looks like @Gaby 's answer works, but unfortunately only partially, unless I am doing something wrong - I'll write some sort of tool using his answer and post it here for improvements. Of course, a #1 or #2 solution would be appreciated... :)

EDIT 3: Ok, the best non-Emacs -answer will get the bounty ;)

Conclusion: Thanks to @hcayless I now have a working #2 solution, I added:

autocmd BufReadPost,FileReadPost *.xml silent %!xmllint --noent - 2> /dev/null

to my .vimrc and everything is hunky dory.

like image 344
Kimvais Avatar asked Jan 07 '10 09:01

Kimvais


People also ask

What are the two types of entity in XML?

In general, we have three types of entities: internal entities, external entities, and parameter entities.

What are XML entities used for entities define?

What are XML entities? XML entities are a way of representing an item of data within an XML document, instead of using the data itself. Various entities are built in to the specification of the XML language. For example, the entities &lt; and &gt; represent the characters < and > .

What is internal and external entity in XML?

Internal Entities: An internal entity (as we saw in above example) is one that is defined locally. Basic purpose of an internal entity is to avoid duplications by using same entity reference multiple times. External Entities: The difference with Internal Entity is; the external entity is defined in an separate file.

What are external entities explain with example?

External Entity means any natural person, corporation, partnership, sole proprietorship, association, organization, holding company, joint stock company, receivership, trust, governmental agency or subdivision regardless of whether organized for profit, nonprofit or charitable purposes.


2 Answers

If you have libxml2 installed, then xmllint will probably do this for you. Depending on your setup, you might need more params, but for your example,

xmllint --noent foobar.xml

will print your file to stdout with all entities resolved. Should be easy enough to wrap some bash scripting around it to do what you need.

like image 76
hcayless Avatar answered Oct 02 '22 05:10

hcayless


For the #3 option you can take a look at pixdom and look at the documentation at pxdom 1.5 A Python DOM implementation

DOMConfiguration parameters

The result of the parse operation depends on the parameters set on the LSParser.domConfig mapping. By default, in accordance with the DOM specification, all CDATA sections will be replaced with plain text nodes and all bound entity references will be replaced by the contents of the entity referred to. This includes external entity references and the external subset.

it includes serializer to save the document to a file ..

like image 38
Gabriele Petrioli Avatar answered Oct 02 '22 06:10

Gabriele Petrioli