Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Qt - Getting source ( HTML code ) of a web page hosted on the internet

I want to get the source (HTML) of a webpage, for example the homepage of StackOverflow.

This is what I've coded so far:

QNetworkAccessManager manager;
QNetworkReply *response = manager.get(QNetworkRequest(QUrl(url)));

QString html = response->readAll(); // Source should be stored here

But nothing happens! When I try to get the value of the html string it's empty ("").

So, what to do? I am using Qt 5.3.1.

like image 829
Alaa Salah Avatar asked Jul 25 '14 23:07

Alaa Salah


People also ask

What is Qt Webchannel?

These interfaces enable applications to embed content from the World Wide Web. It is also possible to combine Qt C++ and QML code with HTML and JavaScript to create web applications. Through Qt WebChannel, several of Qt's APIs are accessible to remote clients.

How do I get the HTML code from a website?

In most modern browsers, there are a few ways we can use to get the HTML code from websites: View the source code of the web page – Press control-u (command-u on a Mac). Inspect the web page – Right-click anywhere on the webpage, inspect element. Save the web page – Press control-s (command-s on a Mac).

Can Qt be used to create a web application?

It is also possible to combine Qt C++ and QML code with HTML and JavaScript to create web applications. Through Qt WebChannel, several of Qt's APIs are accessible to remote clients. Qt provides the Chromium-based Qt WebEngine module for applications targeting desktop and embedded platforms. For example, a web browser application for Linux platform.

How do I view the source code of a website?

Right-click anywhere on the webpage > View source. Or simply hit the shortcut key CTRL-U ( COMMAND-U on a Mac). Please take note that depending on which web browser you are using, this is going to be slightly different – It is called “view page source” on Google Chrome and Firefox, and “view source” on Microsoft Edge…


2 Answers

You need to code it in asynchronous fashion. C++11 and Qt come to the rescue. Just remember that the body of the lambda will execute later from the event loop.

// https://github.com/KubaO/stackoverflown/tree/master/questions/html-get-24965972
#include <QtNetwork>
#include <functional>

void htmlGet(const QUrl &url, const std::function<void(const QString&)> &fun) {
   QScopedPointer<QNetworkAccessManager> manager(new QNetworkAccessManager);
   QNetworkReply *response = manager->get(QNetworkRequest(QUrl(url)));
   QObject::connect(response, &QNetworkReply::finished, [response, fun]{
      response->deleteLater();
      response->manager()->deleteLater();
      if (response->error() != QNetworkReply::NoError) return;
      auto const contentType =
            response->header(QNetworkRequest::ContentTypeHeader).toString();
      static QRegularExpression re("charset=([!-~]+)");
      auto const match = re.match(contentType);
      if (!match.hasMatch() || 0 != match.captured(1).compare("utf-8", Qt::CaseInsensitive)) {
         qWarning() << "Content charsets other than utf-8 are not implemented yet:" << contentType;
         return;
      }
      auto const html = QString::fromUtf8(response->readAll());
      fun(html); // do something with the data
   }) && manager.take();
}

int main(int argc, char *argv[])
{
   QCoreApplication app(argc, argv);
   htmlGet({"http://www.google.com"}, [](const QString &body){ qDebug() << body; qApp->quit(); });
   return app.exec();
}

Unless you're only using this code once, you should put the QNetworkManager instance as a member of your controller class, or in the main, etc.

like image 140
Kuba hasn't forgotten Monica Avatar answered Nov 15 '22 06:11

Kuba hasn't forgotten Monica


You have to add QEventLoop between.

QNetworkAccessManager manager;
QNetworkReply *response = manager.get(QNetworkRequest(QUrl(url)));
QEventLoop event;
connect(response,SIGNAL(finished()),&event,SLOT(quit()));
event.exec();
QString html = response->readAll(); // Source should be stored here
like image 38
MKAROL Avatar answered Nov 15 '22 08:11

MKAROL