Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

linux curl save as utf-8

Tags:

linux

curl

Trying to use linux curl to download an xml file from an url.

Pretty sure that the xml is encoded in UTF-8,

suspecting curl -o doesnt save as UTF-8.

Is there anyway to force save to UTF-8 with curl ?


Thanks for the suggestion, what i found out:

Because the xml feed is dynamic, not all the time it contain any utf-8 characters. Sometimes it doesnt have utf-8 character in the whole content at all even though it is set as utf-8 in the xml encoding and header content type: charset=utf-8. When it contain a utf-8 character at least, it will be save as utf-8.

When this happen, curl doesn't download as utf-8, which makes sense as there are no utf-8 chars, why is there a need to store as utf-8.

This is damn tricky, some validator has to valid against utf-8 hence i still need a solution to force it to utf8 because by default all my xml shld be in utf8-encoding.

tried the suggested by using iconv f iso8859-1 utf-8 doesnt work for this case as i am suspecting it is not in iso8859-1 either.

Still need a better solution.

like image 322
flyclassic Avatar asked Apr 16 '12 10:04

flyclassic


People also ask

How do I save my curl output?

Our browser has been opened and it shows the Html page as output, which was mentioned in the “curl” command. Now, we will use the capital “-O” flag in the curl command to save the Html page into a file without creating a new file name.

What encoding does curl use?

Percent-encoding, also known as URL encoding, is technically a mechanism for encoding data so that it can appear in URLs. This encoding is typically used when sending POSTs with the application/x-www-form-urlencoded content type, such as the ones curl sends with --data and --data-binary etc.

What is curl silent mode?

Silent or quiet mode. Don't show progress meter or error messages. Makes Curl mute. It will still output the data you ask for, potentially even to the terminal/stdout unless you redirect it. Try it out: curl http://example.com --output my.file --silent.


1 Answers

Have you tried adding the Accept-Charset header? I had a similar issue downloading a file which was downloading with the wrong encoding. When I set the Accept-Charset header it works:

curl -H "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7" URL | iconv -f iso8859-1 -t utf-8 > output.xml
like image 163
Andrew Khoury Avatar answered Sep 19 '22 20:09

Andrew Khoury