Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Automate Google Takeout Download

Is there a way to programmatically initiate/schedule google takeout download(all 46+ services) data

https://takeout.google.com/

I would like to take regular backups of this data (Local or Google Drive)

Can Puppeteer be used to automate user clicks in the absence of an API from Google?

like image 740
Prithvi514 Avatar asked Jan 22 '19 21:01

Prithvi514


People also ask

Can you automate Google Takeout?

Limitation on the frequency of backup – You can schedule Takeout to automatically download your Google data two months per year.

How long does Google Takeout take to download?

Depending on the number and size of the files you requested, the archive takes from several minutes to several days to create. Google took about three minutes to create a 175 MB archive file. When the archive is complete, Takeout emails you with a link to the archived files.

Can you import Google Takeout data?

Google does not support a direct import of . mbox files so you may need to use a 3rd party tool to open . mbox format export into Thunderbird or Apple mail. Gmail does support a pop3 email transfer from other email accounts into your Gmail account.


1 Answers

Yes it definitely is possible. I would recommend using a headless browser such as Selenium or Puppeteer. There are are a few steps in order to accomplish this:

1) automate logging in to Google (if need be)
2) automate navigating to Google Takeout and downloading data
3 parse the data
4) write a script so you can automate this whole process on a regular basis.

There are a also a few things to be aware of when writing a web scraper:

When running a browser in headless mode, the the HTML served can differ from a browser run non-headless. In other words the attributes on the DOM elements can be named differently, and since you use these attributes to automate things like selecting and clicking - your code will need to change. For this reason, inspecting the elements through your browser is a good place to start, but you will likely need to adjust the the names of your attributes when running in headless mode, which so be of help for getting started on this. For example, the following code excerpts show the same code for logging into Google written two ways for Puppeteer. One is written headless and one is written non-headless:

Headless Mode:

(async () => {
    const browser = await puppeteer.launch();
    const page = await browser.newPage();
    await page.goto('https://takeout.google.com');


    await page.waitForSelector('input[type=email]')
    await page.type('input[type=email]', process.env.GOOGLE_USER)
    await page.click('#next')

    await page.waitForSelector('#Passwd', { visible: true })
    await page.type('#Passwd', process.env.GOOGLE_PWD);
    await page.waitForSelector('#signIn', { visible: true })
    await page.click('#signIn');

    await page.waitForNavigation()
    await browser.close();

})();

Non-Headless Mode:

(async () => {
    const browser = await puppeteer.launch({ headless: false, slowMo: 50 })
    const page = await browser.newPage()
    await page.goto('https://takeout.google.com');


    await page.waitForSelector('input[type="email"]')
    await page.type('input[type="email"]', process.env.GOOGLE_USER)
    await page.click('#identifierNext')

    await page.waitForSelector('input[type="password"]', { visible: true })
    await page.type('input[type="password"]', process.env.GOOGLE_PWD)
    await page.waitForSelector('#passwordNext', { visible: true })
    await page.click('#passwordNext')


    await page.waitForNavigation()
    await browser.close()
})()
like image 149
user2481095 Avatar answered Oct 13 '22 01:10

user2481095