I am developing a project, for which I want to scrape the contents of a website in the background and get some limited content from that scraped website. For example, in my page I have "userid" and "password" fields, by using those I will access my mail and scrape my inbox contents and display it in my page.
I done the above by using javascript alone. But when I click the sign in button the URL of my page (http://localhost/web/Login.html) is changed to the URL (http://mail.in.com/mails/inbox.php?nomail=....) which I am scraped. But I scrap the details without changing my url.
Web scraping is legal if you scrape data publicly available on the internet. But some kinds of data are protected by international regulations, so be careful scraping personal data, intellectual property, or confidential data. Respect your target websites and use empathy to create ethical scrapers.
There are two approaches to scraping a dynamic webpage: Scrape the content directly from the JavaScript. Scrape the website as we view it in our browser — using Python packages capable of executing the JavaScript.
The answer to that question is a resounding YES! Web scraping is easy! Anyone even without any knowledge of coding can scrape data if they are given the right tool. Programming doesn't have to be the reason you are not scraping the data you need.
Definitely go with PHP Simple HTML DOM Parser. It's fast, easy and super flexible. It basically sticks an entire HTML page in an object then you can access any element from that object.
Like the example of the official site, to get all links on the main Google page:
// Create DOM from URL or file
$html = file_get_html('http://www.google.com/');
// Find all images
foreach($html->find('img') as $element)
echo $element->src . '<br>';
// Find all links
foreach($html->find('a') as $element)
echo $element->href . '<br>';
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With