Is there a way to programmatically initiate/schedule google takeout download(all 46+ services) data
https://takeout.google.com/
I would like to take regular backups of this data (Local or Google Drive)
Can Puppeteer
be used to automate user clicks in the absence of an API from Google?
Limitation on the frequency of backup – You can schedule Takeout to automatically download your Google data two months per year.
Depending on the number and size of the files you requested, the archive takes from several minutes to several days to create. Google took about three minutes to create a 175 MB archive file. When the archive is complete, Takeout emails you with a link to the archived files.
Google does not support a direct import of . mbox files so you may need to use a 3rd party tool to open . mbox format export into Thunderbird or Apple mail. Gmail does support a pop3 email transfer from other email accounts into your Gmail account.
Yes it definitely is possible. I would recommend using a headless browser such as Selenium or Puppeteer. There are are a few steps in order to accomplish this:
1) automate logging in to Google (if need be)
2) automate navigating to Google Takeout and downloading data
3 parse the data
4) write a script so you can automate this whole process on a regular basis.
There are a also a few things to be aware of when writing a web scraper:
When running a browser in headless mode, the the HTML served can differ from a browser run non-headless. In other words the attributes on the DOM elements can be named differently, and since you use these attributes to automate things like selecting and clicking - your code will need to change. For this reason, inspecting the elements through your browser is a good place to start, but you will likely need to adjust the the names of your attributes when running in headless mode, which so be of help for getting started on this. For example, the following code excerpts show the same code for logging into Google written two ways for Puppeteer. One is written headless and one is written non-headless:
Headless Mode:
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://takeout.google.com');
await page.waitForSelector('input[type=email]')
await page.type('input[type=email]', process.env.GOOGLE_USER)
await page.click('#next')
await page.waitForSelector('#Passwd', { visible: true })
await page.type('#Passwd', process.env.GOOGLE_PWD);
await page.waitForSelector('#signIn', { visible: true })
await page.click('#signIn');
await page.waitForNavigation()
await browser.close();
})();
Non-Headless Mode:
(async () => {
const browser = await puppeteer.launch({ headless: false, slowMo: 50 })
const page = await browser.newPage()
await page.goto('https://takeout.google.com');
await page.waitForSelector('input[type="email"]')
await page.type('input[type="email"]', process.env.GOOGLE_USER)
await page.click('#identifierNext')
await page.waitForSelector('input[type="password"]', { visible: true })
await page.type('input[type="password"]', process.env.GOOGLE_PWD)
await page.waitForSelector('#passwordNext', { visible: true })
await page.click('#passwordNext')
await page.waitForNavigation()
await browser.close()
})()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With