Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there any other approach for updating a row in Big Query apart from overwriting the table?

I have a package data with some of its fields as following:

packageid-->string
status--->string
status_type--->string
scans--->record(repeated)
     scanid--->string
     status--->string
scannedby--->string

Per day, I have a data of 100 000 packages. Total package data size per day becomes 100 MB(approx) and for 1 month it becomes 3GB. For each package, 3-4 updates can come. So do I have to overwrite the package table, every time a package update (e.g. just a change in status field) comes?

Suppose I have data of 3 packages in the table and now the update for 2nd package comes, do I have to overwrite the whole table (deleting and adding the whole data takes 2 transaction per package update)? For 100 000 packages, total transactions will be 10^5 * 10^5 * 2/2.

Is there any other approach for atomic updates without overwriting the table? (as if the table contains 1 million entries and then a package update comes, then overwriting the whole table will be an overhead.)

like image 964
hmims Avatar asked Jan 25 '16 12:01

hmims


1 Answers

Currently there is no way to update individual rows. We do see this use case somewhat often, and we recommend something similar to what Mikhail suggested. Basically, if you have some unique ID for a logical row, and a timestamp of the update time to the row data, you can simply add every update as a new row, and apply a view over the table to give you the desired rows.

Your view would look something like this:

SELECT *
FROM (
  SELECT
      *,
      MAX(<timestamp_column>)
          OVER (PARTITION BY <id_column>)
          AS max_timestamp,
  FROM <table>
)
WHERE <timestamp_column> = max_timestamp

(cribbed from here Return only the newest rows from a BigQuery table with a duplicate items)

If your table is partitioned into daily tables (or becomes static after some period), you can then replace the view with the result of the view query after the table stabilizes, and improve your query efficiency.

e.g.

  • Add Data to TABLE_RAW.
  • Create view TABLE that performs the above query over TABLE_RAW
  • At some point after TABLE_RAW is stable, query TABLE with a destination table of TABLE, with write disposition WRITE_TRUNCATE.

Unfortunately, this does add a bit of overhead. That said, for your use case you may be able to just leave the view in place indefinitely, which would simplify things a bit.

like image 102
Sean Chen Avatar answered Jan 03 '23 10:01

Sean Chen