Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dealing with evolving schemas

We are a gaming company that stores events (Up to 1 giga events per day) to bigquery. Events are sharded over month and application in order to lower query costs.

Now to our problem.

Our current solution supports adding new type of events which leads to new versions of the table schema. This versions has also been added to the tables.

I.e. events_app1_v2_201308 and events_app1_v2_201308

If we add events with new column types in september we will also get events_app1_v3_201309

We have written code that finds out involved tables (for a date range) and makes a union of them a'la bigquery's comma separeted FROM clause.

But I just realised that this will NOT work when we make unions over different versions of the event tables.

Anyone that has a smart solution of how to deal with this!?

Right now we are investigating if JSON structures could help us. The current solution is just flat columns. [timestamp, eventId, value, value, value, ...]

From https://developers.google.com/bigquery/query-reference#from

Note: Unlike many other SQL-based systems, BigQuery uses the comma syntax to indicate table unions, not joins. This means you can run a query over several tables with compatible !? schemas as follows:

like image 647
Gunnar Eketrapp Avatar asked Sep 04 '13 08:09

Gunnar Eketrapp


People also ask

How do you handle schema evolution?

The industry solution to handling schema evolution is to include schema information with the data. So, when someone is writing data, they write schema and data both. And when someone wants to read that data, they first read schema and then read data based on the schema.

How does schema evolve?

Schema evolution is a feature that allows users to easily change a table's current schema to accommodate data that is changing over time. Most commonly, it's used when performing an append or overwrite operation, to automatically adapt the schema to include one or more new columns.

What are the 3 types of schemas?

Schema is of three types: Logical Schema, Physical Schema and view Schema. Logical Schema – It describes the database designed at logical level. Physical Schema – It describes the database designed at physical level. View Schema – It defines the design of the database at the view level.

How do you change a schema?

To change the schema of a table by using SQL Server Management Studio, in Object Explorer, right-click on the table and then click Design. Press F4 to open the Properties window. In the Schema box, select a new schema. ALTER SCHEMA uses a schema level lock.


1 Answers

You should be able to modify the table schema of the old tables to add columns, then the union should match. Note that you can only add columns, not remove them. You can use the tables.patch() method to do this, or bq update --schema

Moreover, as long as the new fields aren't marked REQUIRED, they should be considered compatible. If this is not the case, however, it would be a bug -- let us know if that is what you're experiencing.

like image 189
Jordan Tigani Avatar answered Nov 11 '22 01:11

Jordan Tigani