I need some help a couple of questions, using bash tools
<CreateOfficeCode>
<OperatorId>ve</OperatorId>
<OfficeCode>1234</OfficeCode>
<CountryCodeLength>0</CountryCodeLength>
<AreaCodeLength>3</AreaCodeLength>
<Attributes></Attributes>
<ChargeArea></ChargeArea>
</CreateOfficeCode>
to become:
<CreateOfficeCode>
<OperatorId>ve</OperatorId>
<OfficeCode>1234</OfficeCode>
<CountryCodeLength>0</CountryCodeLength>
<AreaCodeLength>3</AreaCodeLength>
</CreateOfficeCode>
for this I have done so by this command
sed -i '/><\//d' file
which is not so strict, its more like a trick, something more appropriate would be to find the <pattern></pattern>
and remove it. Suggestion?
<CreateOfficeGroup>
<CreateOfficeName>John</CreateOfficeName>
<CreateOfficeCode>
</CreateOfficeCode>
</CreateOfficeGroup>
to:
<CreateOfficeGroup>
<CreateOfficeName>John</CreateOfficeName>
</CreateOfficeGroup>
<CreateOfficeGroup>
<CreateOfficeName>John</CreateOfficeName>
<CreateOfficeCode>
<OperatorId>ve</OperatorId>
<OfficeCode>1234</OfficeCode>
<CountryCodeLength>0</CountryCodeLength>
<AreaCodeLength>3</AreaCodeLength>
<Attributes></Attributes>
<ChargeArea></ChargeArea>
</CreateOfficeCode>
<CreateOfficeSize>
<Chairs></Chairs>
<Tables></Tables>
</CreateOfficeSize>
</CreateOfficeGroup>
to:
<CreateOfficeGroup>
<CreateOfficeName>John</CreateOfficeName>
<CreateOfficeCode>
<OperatorId>ve</OperatorId>
<OfficeCode>1234</OfficeCode>
<CountryCodeLength>0</CountryCodeLength>
<AreaCodeLength>3</AreaCodeLength>
</CreateOfficeCode>
</CreateOfficeGroup>
Can you answer the questions as individuals? Thank you very much!
XMLStarlet is a command-line XML processor. Doing what you want with it is a one-line operation (until the desired recursive behavior is added), and will work for all variants of XML syntax describing the same input:
The simple version:
xmlstarlet ed \
-d '//*[not(./*) and (not(./text()) or normalize-space(./text())="")]' \
input.xml
The fancy version:
strip_recursively() {
local doc last_doc
IFS= read -r -d '' doc
while :; do
last_doc=$doc
doc=$(xmlstarlet ed \
-d '//*[not(./*) and (not(./text()) or normalize-space(./text())="")]' \
/dev/stdin <<<"$last_doc")
if [[ $doc = "$last_doc" ]]; then
printf '%s\n' "$doc"
return
fi
done
}
strip_recursively <input.xml
/dev/stdin
is used rather than -
(at some cost to platform portability) for better portability across releases of XMLStarlet; adjust to taste.
With a system having only older dependencies installed, a more likely XML parser to have installed is that bundled with Python.
#!/usr/bin/env python
import xml.etree.ElementTree as etree
import sys
doc = etree.parse(sys.stdin)
def prune(parent):
ever_changed = False
while True:
changed = False
for el in parent.getchildren():
if len(el.getchildren()) == 0:
if ((el.text is None or el.text.strip() == '') and
(el.tail is None or el.tail.strip() == '')):
parent.remove(el)
changed = True
else:
changed = changed or prune(el)
ever_changed = changed or ever_changed
if changed is False:
return ever_changed
prune(doc.getroot())
print etree.tostring(doc.getroot())
sed '#n
1h;1!H
$ { x
:remtag
s#\(\n* *\)*<\([^>]*>\)\( *\n*\)*</\2##g
t remtag
p
}' YourFile
(posix version so --posix
on GNU sed)
<tag1 prop="<tag2></tag2>"> ...
will remove the prop content also and any other thing like that that xml allow.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With