Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can i get content of web-page

Tags:

c++

qt

webpage

i'm trying to get web-page data in string that than i could parse it. I didn't found any methods in qwebview, qurl and another. Could you help me? Linux, C++, Qt.

EDIT:

Thanks for help. Code is working, but some pages after downloading have broken charset. I tried something like this to repair it:

QNetworkRequest *request = new QNetworkRequest(QUrl("http://ru.wiktionary.org/wiki/bovo"));

request->setRawHeader( "User-Agent", "Mozilla/5.0 (X11; U; Linux i686 (x86_64); "
                       "en-US; rv:1.9.0.1) Gecko/2008070206 Firefox/3.0.1" );
request->setRawHeader( "Accept-Charset", "win1251,utf-8;q=0.7,*;q=0.7" );
request->setRawHeader( "charset", "utf-8" );
request->setRawHeader( "Connection", "keep-alive" );

manager->get(*request);

Any results =(.

like image 355
Max Frai Avatar asked Jun 27 '09 16:06

Max Frai


People also ask

What is the content of a web page?

Page content refers to all the information contained in a website. Page content can be displayed as text, links, images, audio, animation or videos among other things. Search engines have a limited ability to recognize images, animation, video and audio.

How do I print text from a web page?

Highlight the text and/or images you want to print on a web page. Now in your browser go to File > Print or simply use the Ctrl + P keyboard combination. The Print screen comes up. Select the Printer you want to use.


2 Answers

Have you looked at QNetworkAccessManager? Here's a rough and ready sample illustrating usage:

class MyClass : public QObject
{
Q_OBJECT

public:
    MyClass();
    void fetch(); 

public slots:
    void replyFinished(QNetworkReply*);

private:
    QNetworkAccessManager* m_manager;
};


MyClass::MyClass()
{
    m_manager = new QNetworkAccessManager(this);

    connect(m_manager, SIGNAL(finished(QNetworkReply*)),
         this, SLOT(replyFinished(QNetworkReply*)));

}

void MyClass::fetch()
{
    m_manager->get(QNetworkRequest(QUrl("http://stackoverflow.com")));
}

void MyClass::replyFinished(QNetworkReply* pReply)
{

    QByteArray data=pReply->readAll();
    QString str(data);

    //process str any way you like!

}

In your in your handler for the finished signal you will be passed a QNetworkReply object, which you can read the response from as it inherits from QIODevice. A simple way to do this is just call readAll to get a QByteArray. You can construct a QString from that QByteArray and do whatever you want to do with it.

like image 106
Paul Dixon Avatar answered Nov 09 '22 16:11

Paul Dixon


Paul Dixon's answer is probably the best approach but Jesse's answer does touch something worth mentioning.

cURL -- or more precisely libcURL is a wonderfully powerful library. No need for executing shell scripts and parsing output, libCURL is available C,C++ and more languages than you can shake an URL at. It might be useful if you are doing some weird operation (like http POST over ssl?) that qt doesnt support.

like image 36
C-o-r-E Avatar answered Nov 09 '22 18:11

C-o-r-E