I've been experimenting with web scraping using Scrapy, and I was interested in retrieving all text messages from all chats on Whatsapp to use as training data for a Machine Learning project. I know there are websites that block web crawlers/scrapers, so I would like to know if it is possible to use Scrapy to obtain these messages, and if it isn't possible, what are some alternatives I can use? I understand that I can click on the "Email chat" option for each chat, but this might not be feasible if I want to obtain a large amount of data, not just from my own chats, but from other people who are willing to let me use their chats for the project.
Export chat history You can use the export chat feature to export a copy of the chat history from an individual or group chat. Open the individual or group chat. Tap More options > More > Export chat. Choose whether to export with media or without media.
Scrapers may attempt to search for and save users' information including phone numbers, user profile pictures, and statuses from the WhatsApp platform. Some accounts might be temporarily or permanently banned if suspected of scraping personal information from the WhatsApp app.
Web scraping is legal if you scrape data publicly available on the internet. But some kinds of data are protected by international regulations, so be careful scraping personal data, intellectual property, or confidential data.
I think WhatsApp do not block crawlers and scrapers. You have access only to your web.whatsapp.com. It's your matter what will you do with your messages. When I write code to read/write WhatsApp messages I used Selenium WebDriver, which can fully automate any browser actions. It worked too stable for WhatsUpp. It was not fully automation, be course of QR code. If you press F12 and go to "network" tab in web browser, you will notice XHR packets with messages inside. You can see it when you load new messages during scrolling or opening person. It look like byte data.
Thank you to Mohit Jindal. You are right there is a way to use browser profile like that:
from selenium import webdriver
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('user-data-dir=selenium/')
driver = webdriver.Chrome(options=chrome_options)
It will crate Chrom profile in "selenium/" folder. This way allow you to login using your phone just initial time.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With