Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Firebase data to Google BigQuery

Firebase offers private backups on Google Cloud Storage. One of the featured use case is "Ingestion into Analytics Products":

Private Backups provides a perfect pipeline into cloud analytics products such as Google’s BigQuery. Cloud Analytics products often prefer to ingest data through Cloud Storage buckets rather than directly from the application.

I have a lot of data in Firebase (more than 1GB when exported to a Cloud Storage bucket) and, as described in Firebase offering, I wanted to put those data in Big Query.

But is it really possible to write a table schema that fits Firebase raw data? Let's take as an example the dinosaur-facts database from Firebase documentation. The JSON looks like this:

{
  "dinosaurs" : {
    "bruhathkayosaurus" : {
      "appeared" : -70000000,
      "height" : 25
    },
    "lambeosaurus" : {
      "appeared" : -76000000,
      "height" : 2.1
    }
  },
  "scores" : {
    "bruhathkayosaurus" : 55,
    "lambeosaurus" : 21
  }
}

To list all dinosaurs, I suppose the only way would be to use a RECORD field in bigQuery schema. But usually RECORDS in BigQuery correspond to an array in the imported JSON. And there's no array here in Firebase, just an object with dinosaur names as the key names.

So a BigQuery table schema like this doesn't work:

[
    {
        "name": "dinosaurs",
        "type": "RECORD",
        "mode": "REQUIRED",
        "fields": [
            {
                "name": "dinosaur",
                "type": "RECORD",
                "mode": "REPEATED",
                "fields": [
                    {
                        "name": "appeared",
                        "type": "INTEGER"
                    },
                    {
                        "name": "height",
                        "type": "INTEGER"
                    },
                    {
                        "name": "length",
                        "type": "INTEGER"
                    },
                    {
                        "name": "order",
                        "type": "STRING"
                    },
                    {
                        "name": "vanished",
                        "type": "INTEGER"
                    },
                    {
                        "name": "weight",
                        "type": "INTEGER"
                    }
                ]
            },
            {
                "name": "scores",
                "type": "RECORD",
                "mode": "REPEATED",
                "fields": [
                    {
                        "name": "dinosaur",
                        "type": "INTEGER"
                    }
                ]
            }
        ]
    }
]

Is it possible to write a table schema that fits Firebase raw data? Or should we first prepare the data to make it compatible with BigQuery?

like image 700
Romain Vialard Avatar asked Feb 09 '23 06:02

Romain Vialard


2 Answers

Since the data above is just JSON, you should be able to get it to work with Firebase. However, I think that it would be much easier to prepare the data after the backup.

You mentioned that there was no arrays in the Firebase data. Firebase does support arrays, but they have to meet a certain criteria.

// we send this
['a', 'b', 'c', 'd', 'e']
// Firebase stores this
{0: 'a', 1: 'b', 2: 'c', 3: 'd', 4: 'e'}
// since the keys are numeric and sequential,
// if we query the data, we get this
['a', 'b', 'c', 'd', 'e']

Even though it may look like an object in the Firebase database, it will come back as an array when queried.

So it is feasible to create your schema in your Firebase database, but it would likely create a lot of overhead for your application.

like image 170
David East Avatar answered Feb 12 '23 11:02

David East


When writing this 03/2017, I can confirm that there's no real integration between Firebase Realtime database and BigQuery. Only Firebase Analytics can be imported easily into BigQuery. All this is not clearly explained on Firebase either...

We ended up writing our own solution, but you can check out this Github repo that has some 400+ stars, so I am assuming a few people found it useful...

like image 44
Luca Fracassi Avatar answered Feb 12 '23 09:02

Luca Fracassi