Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C++ URLencode library (Unicode capable)?

Tags:

c++

linux

windows

I need a library that can URLencode a string/char array.

Now, I can hex encode an ASCII array like here: http://www.codeguru.com/cpp/cpp/cpp_mfc/article.php/c4029

But I need something that works with Unicode. Note: On Linux AND on Windows !

CURL has a quite nice:

 char *encodedURL = curl_easy_escape(handle,WEBPAGE_URL, strlen(WEBPAGE_URL));

but first, that needs CURL and it also is not unicode capable, as one sees by strlen

like image 372
Stefan Steiger Avatar asked Aug 28 '10 07:08

Stefan Steiger


2 Answers

If I read the quest correctly and you want to do this yourself, without using curl I think I have a solution (sssuming UTF-8) and I think this is a conformant and portable way of URL encoding query strings:

#include <boost/function_output_iterator.hpp>
#include <boost/bind.hpp>
#include <algorithm>
#include <sstream>
#include <iostream>
#include <iterator>
#include <iomanip>

namespace {
  std::string encimpl(std::string::value_type v) {
    if (isalnum(v))
      return std::string()+v;

    std::ostringstream enc;
    enc << '%' << std::setw(2) << std::setfill('0') << std::hex << std::uppercase << int(static_cast<unsigned char>(v));
    return enc.str();
  }
}

std::string urlencode(const std::string& url) {
  // Find the start of the query string
  const std::string::const_iterator start = std::find(url.begin(), url.end(), '?');

  // If there isn't one there's nothing to do!
  if (start == url.end())
    return url;

  // store the modified query string
  std::string qstr;

  std::transform(start+1, url.end(),
                 // Append the transform result to qstr
                 boost::make_function_output_iterator(boost::bind(static_cast<std::string& (std::string::*)(const std::string&)>(&std::string::append),&qstr,_1)),
                 encimpl);
  return std::string(url.begin(), start+1) + qstr;
}

It has no non-standard dependencies other than boost and if you don't like the boost dependency it's not that hard to remove.

I tested it using:

int main() {
    const char *testurls[] = {"http://foo.com/bar?abc<>de??90   210fg!\"$%",
                              "http://google.com",
                              "http://www.unicode.com/example?großpösna"};
    std::copy(testurls, &testurls[sizeof(testurls)/sizeof(*testurls)],
              std::ostream_iterator<std::string>(std::cout,"\n"));
    std::cout << "encode as: " << std::endl;
    std::transform(testurls, &testurls[sizeof(testurls)/sizeof(*testurls)],
                   std::ostream_iterator<std::string>(std::cout,"\n"),
                   std::ptr_fun(urlencode));
}

Which all seemed to work:

http://foo.com/bar?abc<>de??90   210fg!"$%
http://google.com
http://www.unicode.com/example?großpösna

Becomes:

http://foo.com/bar?abc%3C%3Ede%3F%3F90%20%20%20210fg%21%22%24%25
http://google.com
http://www.unicode.com/example?gro%C3%9Fp%C3%B6sna

Which squares with these examples

like image 137
Flexo Avatar answered Nov 02 '22 03:11

Flexo


You can consider converting your Unicode URL to UTF8 first, the UTF8 data will carry your Unicode data in ASCII characters, Once you get your URL in UTF8 you can easily encode the URL with the API you prefer.

like image 28
GJ. Avatar answered Nov 02 '22 02:11

GJ.