I am (was) a Python developer who is building a GUI web scraping application. Recently I've decided to migrate to .NET framework and write the same application in C# (this decision wasn't mine).
In Python, I've used the Mechanize library. However, I can't seem to find anything similar in .NET. What I need is a browser that will run in a headless mode, which has the ability to fill out forms, submit them, etc. JavaScript parser is not a must, but it would be quite useful.
A headless browser is a type of software that can access webpages but does not show them to the user and can pipe the content of the webpages to another program. Unlike a normal browser, nothing will appear on the screen when you start up a headless browser, since the programs run at the backend.
HtmlUnitDriver is a Headless web browser written in Java. The name suggests it is a Headless driver which is based on HtmlUnit. HtmlUnitDriver is a built-in headless browser in Selenium WebDriver. It is considered to be the most lightweight and fast browser.
> Modifying the content... SlimerJS is useful to do functional tests, page automation, network monitoring, screen capture, web scraping etc. SlimerJS is similar to PhantomJs, except that it runs on top of Gecko, the browser engine of Mozilla Firefox, instead of Webkit, and it can be headless or not.
There are some options:
WebKit.Net (free)
Awesomium
It is based on Chrome/WebKit and works like a charm. There is a free license available but also a commercial one and if need be you can buy the source code :-)
HTML Agility Pack (free) (An HTML Parser library, NOT a headless browser)
This helps with extracting information from HTML etc. and might be useful in your case (possibly in combination with HttpWebRequest
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With