Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to scrape all text messages from Whatsapp Web with Scrapy?

I've been experimenting with web scraping using Scrapy, and I was interested in retrieving all text messages from all chats on Whatsapp to use as training data for a Machine Learning project. I know there are websites that block web crawlers/scrapers, so I would like to know if it is possible to use Scrapy to obtain these messages, and if it isn't possible, what are some alternatives I can use? I understand that I can click on the "Email chat" option for each chat, but this might not be feasible if I want to obtain a large amount of data, not just from my own chats, but from other people who are willing to let me use their chats for the project.

like image 917
Romario Timothy Vaz Avatar asked Jun 09 '18 15:06

Romario Timothy Vaz


People also ask

How do I export all chats from WhatsApp web?

Export chat history You can use the export chat feature to export a copy of the chat history from an individual or group chat. Open the individual or group chat. Tap More options > More > Export chat. Choose whether to export with media or without media.

Can we scrape data from WhatsApp?

Scrapers may attempt to search for and save users' information including phone numbers, user profile pictures, and statuses from the WhatsApp platform. Some accounts might be temporarily or permanently banned if suspected of scraping personal information from the WhatsApp app.

Can you scrape websites legally?

Web scraping is legal if you scrape data publicly available on the internet. But some kinds of data are protected by international regulations, so be careful scraping personal data, intellectual property, or confidential data.


1 Answers

I think WhatsApp do not block crawlers and scrapers. You have access only to your web.whatsapp.com. It's your matter what will you do with your messages. When I write code to read/write WhatsApp messages I used Selenium WebDriver, which can fully automate any browser actions. It worked too stable for WhatsUpp. It was not fully automation, be course of QR code. If you press F12 and go to "network" tab in web browser, you will notice XHR packets with messages inside. You can see it when you load new messages during scrolling or opening person. It look like byte data.

Thank you to Mohit Jindal. You are right there is a way to use browser profile like that:

from selenium import webdriver

chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('user-data-dir=selenium/')

driver = webdriver.Chrome(options=chrome_options)

It will crate Chrom profile in "selenium/" folder. This way allow you to login using your phone just initial time.

like image 50
Oleg T. Avatar answered Nov 11 '22 06:11

Oleg T.