Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

c# headless browser with javascript support for crawler

Could anyone suggest headless browser for .NET that supports cookies and authomatically javascript execution?

like image 393
Bogdan Dudnik Avatar asked Mar 06 '13 18:03

Bogdan Dudnik


People also ask

What C is used for?

C programming language is a machine-independent programming language that is mainly used to create many types of applications and operating systems such as Windows, and other complicated programs such as the Oracle database, Git, Python interpreter, and games and is considered a programming foundation in the process of ...

What is the full name of C?

In the real sense it has no meaning or full form. It was developed by Dennis Ritchie and Ken Thompson at AT&T bell Lab. First, they used to call it as B language then later they made some improvement into it and renamed it as C and its superscript as C++ which was invented by Dr.

Is C language easy?

C is a general-purpose language that most programmers learn before moving on to more complex languages. From Unix and Windows to Tic Tac Toe and Photoshop, several of the most commonly used applications today have been built on C. It is easy to learn because: A simple syntax with only 32 keywords.

What is C in C language?

What is C? C is a general-purpose programming language created by Dennis Ritchie at the Bell Laboratories in 1972. It is a very popular language, despite being old. C is strongly associated with UNIX, as it was developed to write the UNIX operating system.


1 Answers

Selenium+HtmlUnitDriver/GhostDriver is exactly what you are looking for. Oversimplified, Selenium is library for using variety of browsers for automation purposes - testing, scraping, task automation.

There are different WebDriver classes with which you can operate an actual browser. HtmlUnitDriver is a headless one. GhostDriver is a WebDriver for PhantomJS, so you can write C# while actually PhantomJS will do the heavy lifting.

Code snippet from Selenium docs for Firefox, but code with GhostDriver (PhantomJS) or HtmlUnitDriver is almost identical.

using OpenQA.Selenium;
using OpenQA.Selenium.Firefox;
using OpenQA.Selenium.Support.UI;

class GoogleSuggest
{
    static void Main(string[] args)
    {
        // driver initialization varies across different drivers
        // but they all support parameter-less constructors
        IWebDriver driver = new FirefoxDriver();
        driver.Navigate().GoToUrl("http://www.google.com/");


        IWebElement query = driver.FindElement(By.Name("q"));
        query.SendKeys("Cheese");
        query.Submit();

        WebDriverWait wait = new WebDriverWait(driver, TimeSpan.FromSeconds(10));
        wait.Until((d) => { return d.Title.ToLower().StartsWith("cheese"); });

        System.Console.WriteLine("Page title is: " + driver.Title);

        driver.Quit();
    }
}

If you run this on Windows machine you can use actual Firefox/Chrome driver because it will open an actual browser window which will operate as programmed in your C#. HtmlUnitDriver is the most lightweight and fast.

I have successfully ran Selenium for C# (FirefoxDriver) on Linux using Mono. I suppose HtmlUnitDriver will also work as fine as the others, so if you require speed - I suggest you go for Mono (you can develop, test and compile with Visual Studio on Windows, no problem) + Selenium HtmlUnitDriver running on Linux host without desktop.

like image 98
shturm Avatar answered Oct 12 '22 05:10

shturm