Firebase offers private backups on Google Cloud Storage. One of the featured use case is "Ingestion into Analytics Products":
Private Backups provides a perfect pipeline into cloud analytics products such as Google’s BigQuery. Cloud Analytics products often prefer to ingest data through Cloud Storage buckets rather than directly from the application.
I have a lot of data in Firebase (more than 1GB when exported to a Cloud Storage bucket) and, as described in Firebase offering, I wanted to put those data in Big Query.
But is it really possible to write a table schema that fits Firebase raw data? Let's take as an example the dinosaur-facts database from Firebase documentation. The JSON looks like this:
{
"dinosaurs" : {
"bruhathkayosaurus" : {
"appeared" : -70000000,
"height" : 25
},
"lambeosaurus" : {
"appeared" : -76000000,
"height" : 2.1
}
},
"scores" : {
"bruhathkayosaurus" : 55,
"lambeosaurus" : 21
}
}
To list all dinosaurs, I suppose the only way would be to use a RECORD field in bigQuery schema. But usually RECORDS in BigQuery correspond to an array in the imported JSON. And there's no array here in Firebase, just an object with dinosaur names as the key names.
So a BigQuery table schema like this doesn't work:
[
{
"name": "dinosaurs",
"type": "RECORD",
"mode": "REQUIRED",
"fields": [
{
"name": "dinosaur",
"type": "RECORD",
"mode": "REPEATED",
"fields": [
{
"name": "appeared",
"type": "INTEGER"
},
{
"name": "height",
"type": "INTEGER"
},
{
"name": "length",
"type": "INTEGER"
},
{
"name": "order",
"type": "STRING"
},
{
"name": "vanished",
"type": "INTEGER"
},
{
"name": "weight",
"type": "INTEGER"
}
]
},
{
"name": "scores",
"type": "RECORD",
"mode": "REPEATED",
"fields": [
{
"name": "dinosaur",
"type": "INTEGER"
}
]
}
]
}
]
Is it possible to write a table schema that fits Firebase raw data? Or should we first prepare the data to make it compatible with BigQuery?
Since the data above is just JSON, you should be able to get it to work with Firebase. However, I think that it would be much easier to prepare the data after the backup.
You mentioned that there was no arrays in the Firebase data. Firebase does support arrays, but they have to meet a certain criteria.
// we send this
['a', 'b', 'c', 'd', 'e']
// Firebase stores this
{0: 'a', 1: 'b', 2: 'c', 3: 'd', 4: 'e'}
// since the keys are numeric and sequential,
// if we query the data, we get this
['a', 'b', 'c', 'd', 'e']
Even though it may look like an object in the Firebase database, it will come back as an array when queried.
So it is feasible to create your schema in your Firebase database, but it would likely create a lot of overhead for your application.
When writing this 03/2017, I can confirm that there's no real integration between Firebase Realtime database and BigQuery. Only Firebase Analytics can be imported easily into BigQuery. All this is not clearly explained on Firebase either...
We ended up writing our own solution, but you can check out this Github repo that has some 400+ stars, so I am assuming a few people found it useful...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With