Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a difference in `BigQueryIO` when you use `fromTable` vs `fromQuery("SELECT * ...")` in dataflow?

When you need to read all the data from one or more tables in bigquery in a dataflow job there are two approaches to it I would say. The first one is to use BigQueryIO with from, which reads the table in question, and the second approach is to use fromQuery where you specify a query that reads all the data from the same table. So my question is:

  • Is it any cost or performance benefit for using one over the other?

I haven't find anything in the docs about this, but I would really like to know. I imagine that maybe read is faster since you don't need to run a query that scans the data, meaning it is more similar to the preview functionality you have in BigQuery UI. If that is true it might also be much cheaper, but it make sense if they both cost the same.

So in short, what is the difference between:

BigQueryIO.read(...).from(tableName)

And

BigQueryIO.read(...).fromQuery("SELECT * FROM " + tableName)
like image 330
Tomas Jansson Avatar asked Jan 24 '26 10:01

Tomas Jansson


1 Answers

from is both cheaper and faster than fromQuery(SELECT * FROM ...).

  • from directly exports the table and exporting data is free for BigQuery.
  • fromQuery(SELECT * FROM ...) will first scan the entire table ($5/TB) and export the result.
like image 116
Jiayuan Ma Avatar answered Jan 26 '26 15:01

Jiayuan Ma