Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Load Google Datastore Backups from Data Storage to Google BigQuery

Our requirement is to programmatically backup Google Datastore and load these backups to Google Big query for further analysis. We were successful in automating backups using the following approach

        Queue queue = QueueFactory.getQueue("datastoreBackupQueue");

        /*
         * Create a task which is equivalent to the backup URL mentioned in
         * above cron.xml, using new queue which has Datastore admin enabled
         */
        TaskOptions taskOptions = TaskOptions.Builder.withUrl("/_ah/datastore_admin/backup.create")
                .method(TaskOptions.Method.GET).param("name", "").param("filesystem", "gs")
                .param("gs_bucket_name",
                        "db-backup" + "/" + TimeUtils.parseDateToString(new Date(), "yyyy/MMM/dd"))
                .param("queue", queue.getQueueName());

        /*
         * Get list of dynamic entity kind names from the datastore based on
         * the kinds present in the datastore at the start of backup
         */
        List<String> entityNames = getEntityNamesForBackup();
        for (String entityName : entityNames) {
            taskOptions.param("kind", entityName);
        }

        /* Add this task to above queue */
        queue.add(taskOptions);

I was able to then import this backups to Google Bigquery manually, But how do we automate this process?

I have also looked at most of the docs and nothing helped https://cloud.google.com/bigquery/docs/loading-data-cloud-storage#loading_data_from_google_cloud_storage

like image 409
amithgc Avatar asked Feb 21 '26 00:02

amithgc


1 Answers

I have solved this myself, Here is the solution using JAVA The following code will pickup the backup files from GoogleCloud storage and load it into Google Big Query.

        AppIdentityCredential bqCredential = new AppIdentityCredential(
                Collections.singleton(BigqueryScopes.BIGQUERY));

        AppIdentityCredential dsCredential = new AppIdentityCredential(
                Collections.singleton(StorageScopes.CLOUD_PLATFORM));

        Storage storage = new Storage(HTTP_TRANSPORT, JSON_FACTORY, dsCredential);
        Objects list = storage.objects().list(bucket).setPrefix(prefix).setFields("items/name").execute();

        if (list == null) {
            Log.severe(BackupDBController.class, "BackupToBigQueryController",
                    "List from Google Cloud Storage was null", null);
        } else if (list.isEmpty()) {
            Log.severe(BackupDBController.class, "BackupToBigQueryController",
                    "List from Google Cloud Storage was empty", null);
        } else {

            for (String kind : getEntityNamesForBackup()) {
                Job job = new Job();
                JobConfiguration config = new JobConfiguration();
                JobConfigurationLoad loadConfig = new JobConfigurationLoad();

                String url = "";
                for (StorageObject obj : list.getItems()) {
                    String currentUrl = obj.getName();
                    if (currentUrl.contains(kind + ".backup_info")) {
                        url = currentUrl;
                        break;
                    }
                }

                if (StringUtils.isStringEmpty(url)) {
                    continue;
                } else {
                    url = "gs://"+bucket+"/" + url;
                }

                List<String> gsUrls = new ArrayList<>();
                gsUrls.add(url);

                loadConfig.setSourceUris(gsUrls);
                loadConfig.set("sourceFormat", "DATASTORE_BACKUP");
                loadConfig.set("allowQuotedNewlines", true);

                TableReference table = new TableReference();
                table.setProjectId(projectId);
                table.setDatasetId(datasetId);
                table.setTableId(kind);
                loadConfig.setDestinationTable(table);

                config.setLoad(loadConfig);
                job.setConfiguration(config);

                Bigquery bigquery = new Bigquery.Builder(HTTP_TRANSPORT, JSON_FACTORY, bqCredential)
                        .setApplicationName("BigQuery-Service-Accounts/0.1").setHttpRequestInitializer(bqCredential)
                        .build();
                Insert insert = bigquery.jobs().insert(projectId, job);

                JobReference jr = insert.execute().getJobReference();
                Log.info(BackupDBController.class, "BackupToBigQueryController",
                        "Moving data to BigQuery was successful", null);
            }
        }

If anyone has a better approach, Please let me know

like image 121
amithgc Avatar answered Feb 23 '26 05:02

amithgc



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!