I need to crawl some data that sits behind a login page. To be able to scrap it I need a tool that is able to login and then crawl the pages behind it. Is it possible to do this behind import.io?
ParseHub is a free and powerful web scraper that can log in to any site before it starts scraping data. You can then set it up to extract the specific data you want and download it all to an Excel or JSON file. To get started, make sure you download and install ParseHub for free.
Import.io is a platform which facilitates the conversion of semi-structured information in web pages into structured data, which can be used for anything from driving business decisions to integration with apps and other platforms.
Short version: yes, it is.
Longer version: There are at least two ways, both require you to sign up and download the desktop app (all free)
Extractor version (simpler): Point the browser to the page where the login is. Login normally, then train your API to extract the data you need. The downside of using this method is that it will only work as long as you are logged in. If you want import.io to login for you you'll need the..
Authenticated version: As above, but create an authenticated API. This will record for login procedure and execute it for you every time you execute the API
Since the chosen answer doesn't work anymore :( I recommend Cloudscrape. You will get a free trial with 20 hours of crawling and/or scraping if you sign up. For data behind a login you will need a scraper.
Tutorial for logging in with scraper.
Tutorial for pagination.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With