I know this is a borderline case whether it really belongs to stackoverflow or superuser, but as it seems there are quite a few 'editing code' questions over here, I am posting it on SO.
I have a pile of XML files that someone in their infinite wisdom have decided to explode to a multiple files using the tags, which in result makes debugging/editing them a huge P-i-t-A. Therefore I am looking for:
The files included on top level might include new files and so on on who knows on how many levels so this needs to be recursive...
Here's a mockup sample on what the top level file looks like:
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE foobar PUBLIC "foobar:dtd" "foobar.dtd" [
<!ENTITY foo SYSTEM "foo.xml">
<!ENTITY bar SYSTEM "bar.xml">
]>
<foo>
<params>
&foo;
</params>
<bar>
&bar;
</bar>
</foo>
EDIT: The list is in order of preference - if no 1. or 2. solutions are available, the bounty goes for the best #3...
EDIT 2: Looks like @Gaby 's answer works, but unfortunately only partially, unless I am doing something wrong - I'll write some sort of tool using his answer and post it here for improvements. Of course, a #1 or #2 solution would be appreciated... :)
EDIT 3: Ok, the best non-Emacs -answer will get the bounty ;)
Conclusion: Thanks to @hcayless I now have a working #2 solution, I added:
autocmd BufReadPost,FileReadPost *.xml silent %!xmllint --noent - 2> /dev/null
to my .vimrc
and everything is hunky dory.
In general, we have three types of entities: internal entities, external entities, and parameter entities.
What are XML entities? XML entities are a way of representing an item of data within an XML document, instead of using the data itself. Various entities are built in to the specification of the XML language. For example, the entities < and > represent the characters < and > .
Internal Entities: An internal entity (as we saw in above example) is one that is defined locally. Basic purpose of an internal entity is to avoid duplications by using same entity reference multiple times. External Entities: The difference with Internal Entity is; the external entity is defined in an separate file.
External Entity means any natural person, corporation, partnership, sole proprietorship, association, organization, holding company, joint stock company, receivership, trust, governmental agency or subdivision regardless of whether organized for profit, nonprofit or charitable purposes.
If you have libxml2 installed, then xmllint will probably do this for you. Depending on your setup, you might need more params, but for your example,
xmllint --noent foobar.xml
will print your file to stdout with all entities resolved. Should be easy enough to wrap some bash scripting around it to do what you need.
For the #3 option you can take a look at pixdom and look at the documentation at pxdom 1.5 A Python DOM implementation
DOMConfiguration parameters
The result of the parse operation depends on the parameters set on the LSParser.domConfig mapping. By default, in accordance with the DOM specification, all CDATA sections will be replaced with plain text nodes and all bound entity references will be replaced by the contents of the entity referred to. This includes external entity references and the external subset.
it includes serializer to save the document to a file ..
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With