I want to get the source (HTML) of a webpage, for example the homepage of StackOverflow.
This is what I've coded so far:
QNetworkAccessManager manager;
QNetworkReply *response = manager.get(QNetworkRequest(QUrl(url)));
QString html = response->readAll(); // Source should be stored here
But nothing happens! When I try to get the value of the html
string it's empty ("").
So, what to do? I am using Qt 5.3.1.
These interfaces enable applications to embed content from the World Wide Web. It is also possible to combine Qt C++ and QML code with HTML and JavaScript to create web applications. Through Qt WebChannel, several of Qt's APIs are accessible to remote clients.
In most modern browsers, there are a few ways we can use to get the HTML code from websites: View the source code of the web page – Press control-u (command-u on a Mac). Inspect the web page – Right-click anywhere on the webpage, inspect element. Save the web page – Press control-s (command-s on a Mac).
It is also possible to combine Qt C++ and QML code with HTML and JavaScript to create web applications. Through Qt WebChannel, several of Qt's APIs are accessible to remote clients. Qt provides the Chromium-based Qt WebEngine module for applications targeting desktop and embedded platforms. For example, a web browser application for Linux platform.
Right-click anywhere on the webpage > View source. Or simply hit the shortcut key CTRL-U ( COMMAND-U on a Mac). Please take note that depending on which web browser you are using, this is going to be slightly different – It is called “view page source” on Google Chrome and Firefox, and “view source” on Microsoft Edge…
You need to code it in asynchronous fashion. C++11 and Qt come to the rescue. Just remember that the body of the lambda will execute later from the event loop.
// https://github.com/KubaO/stackoverflown/tree/master/questions/html-get-24965972
#include <QtNetwork>
#include <functional>
void htmlGet(const QUrl &url, const std::function<void(const QString&)> &fun) {
QScopedPointer<QNetworkAccessManager> manager(new QNetworkAccessManager);
QNetworkReply *response = manager->get(QNetworkRequest(QUrl(url)));
QObject::connect(response, &QNetworkReply::finished, [response, fun]{
response->deleteLater();
response->manager()->deleteLater();
if (response->error() != QNetworkReply::NoError) return;
auto const contentType =
response->header(QNetworkRequest::ContentTypeHeader).toString();
static QRegularExpression re("charset=([!-~]+)");
auto const match = re.match(contentType);
if (!match.hasMatch() || 0 != match.captured(1).compare("utf-8", Qt::CaseInsensitive)) {
qWarning() << "Content charsets other than utf-8 are not implemented yet:" << contentType;
return;
}
auto const html = QString::fromUtf8(response->readAll());
fun(html); // do something with the data
}) && manager.take();
}
int main(int argc, char *argv[])
{
QCoreApplication app(argc, argv);
htmlGet({"http://www.google.com"}, [](const QString &body){ qDebug() << body; qApp->quit(); });
return app.exec();
}
Unless you're only using this code once, you should put the QNetworkManager
instance as a member of your controller class, or in the main
, etc.
You have to add QEventLoop between.
QNetworkAccessManager manager;
QNetworkReply *response = manager.get(QNetworkRequest(QUrl(url)));
QEventLoop event;
connect(response,SIGNAL(finished()),&event,SLOT(quit()));
event.exec();
QString html = response->readAll(); // Source should be stored here
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With