Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

xmllint: how to convert UTF-8 numeric references into characters

Tags:

utf-8

xmllint

I'd like to convert UTF-8 numeric references into characters in the output from xmllint.

To reproduce:

$ wget http://il.srgssr.ch/integrationlayer/1.0/ue/rts/video/play/4727630.xml
$ xmllint --xpath "/Video/AssetMetadatas/AssetMetadata/title/text()" 4727630.xml && echo
Le jardin apprivoisé - Entre pierre et bois

I'd like the output to be:

Le jardin apprivoisé - Entre pierre et bois

I've read the man page and tried different options, but nothing worked.

If possible I'd like to achieve this using options from xmllint, or if this is not possible with another command line tool which is commonly found in Linux distributions.

Thanks!

like image 328
raphaelh Avatar asked Oct 19 '22 17:10

raphaelh


2 Answers

I understand that the question is a little bit outdated by I came here from Google and want to share possible answer for future visitors. It is necessary to slightly change xpath expression and use string() function instead of text():

$ wget http://il.srgssr.ch/integrationlayer/1.0/ue/rts/video/play/4727630.xml
$ xmllint --xpath "string(/Video/AssetMetadatas/AssetMetadata/title)" 4727630.xml
Le jardin apprivoisé - Entre pierre et bois
like image 146
Lesha Avatar answered Oct 22 '22 21:10

Lesha


I have found another way which I think can completely solves this problem. The trick is using the recode library provided by GNU to change output encoding from html to utf8.

$ wget http://il.srgssr.ch/integrationlayer/1.0/ue/rts/video/play/4727630.xml
$ xmllint --xpath "/Video/AssetMetadatas/AssetMetadata/title/text()" 4727630.xml | recode html..utf8
Le jardin apprivoisé - Entre pierre et bois

recode can be installed using apt-get install recode.

like image 31
eng.mrgh Avatar answered Oct 22 '22 23:10

eng.mrgh