Trying to use linux curl to download an xml file from an url.
Pretty sure that the xml is encoded in UTF-8,
suspecting curl -o doesnt save as UTF-8.
Is there anyway to force save to UTF-8 with curl ?
Thanks for the suggestion, what i found out:
Because the xml feed is dynamic, not all the time it contain any utf-8 characters. Sometimes it doesnt have utf-8 character in the whole content at all even though it is set as utf-8 in the xml encoding and header content type: charset=utf-8. When it contain a utf-8 character at least, it will be save as utf-8.
When this happen, curl doesn't download as utf-8, which makes sense as there are no utf-8 chars, why is there a need to store as utf-8.
This is damn tricky, some validator has to valid against utf-8 hence i still need a solution to force it to utf8 because by default all my xml shld be in utf8-encoding.
tried the suggested by using iconv f iso8859-1 utf-8 doesnt work for this case as i am suspecting it is not in iso8859-1 either.
Still need a better solution.
Our browser has been opened and it shows the Html page as output, which was mentioned in the “curl” command. Now, we will use the capital “-O” flag in the curl command to save the Html page into a file without creating a new file name.
Percent-encoding, also known as URL encoding, is technically a mechanism for encoding data so that it can appear in URLs. This encoding is typically used when sending POSTs with the application/x-www-form-urlencoded content type, such as the ones curl sends with --data and --data-binary etc.
Silent or quiet mode. Don't show progress meter or error messages. Makes Curl mute. It will still output the data you ask for, potentially even to the terminal/stdout unless you redirect it. Try it out: curl http://example.com --output my.file --silent.
Have you tried adding the Accept-Charset header? I had a similar issue downloading a file which was downloading with the wrong encoding. When I set the Accept-Charset header it works:
curl -H "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7" URL | iconv -f iso8859-1 -t utf-8 > output.xml
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With