I'd like to convert UTF-8 numeric references into characters in the output from xmllint.
To reproduce:
$ wget http://il.srgssr.ch/integrationlayer/1.0/ue/rts/video/play/4727630.xml
$ xmllint --xpath "/Video/AssetMetadatas/AssetMetadata/title/text()" 4727630.xml && echo
Le jardin apprivoisé - Entre pierre et bois
I'd like the output to be:
Le jardin apprivoisé - Entre pierre et bois
I've read the man page and tried different options, but nothing worked.
If possible I'd like to achieve this using options from xmllint, or if this is not possible with another command line tool which is commonly found in Linux distributions.
Thanks!
I understand that the question is a little bit outdated by I came here from Google and want to share possible answer for future visitors. It is necessary to slightly change xpath expression and use string() function instead of text():
$ wget http://il.srgssr.ch/integrationlayer/1.0/ue/rts/video/play/4727630.xml
$ xmllint --xpath "string(/Video/AssetMetadatas/AssetMetadata/title)" 4727630.xml
Le jardin apprivoisé - Entre pierre et bois
I have found another way which I think can completely solves this problem. The trick is using the recode
library provided by GNU to change output encoding from html
to utf8
.
$ wget http://il.srgssr.ch/integrationlayer/1.0/ue/rts/video/play/4727630.xml $ xmllint --xpath "/Video/AssetMetadatas/AssetMetadata/title/text()" 4727630.xml | recode html..utf8 Le jardin apprivoisé - Entre pierre et bois
recode
can be installed using apt-get install recode
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With