Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

BigQuery with BigTable connection, cannot execute any query

I want to generate some reports based on the data in BigTable. For that I wanted to create a query which will get the latest data from the BigTable and pass it over onto the data studio report. Now the problem is that when I create a BigTable connection in BigQuery I cannot execute any query, even on the empty table. I create the type for BigQuery in the following way:

bq mk \
--external_table_definition=gs://somebucket/big-table-definition.json \
datareportingdataset.datareportingtable

and the command executes successfully. My big-table-definition.json looks like the following:

{
    "sourceFormat": "BIGTABLE",
    "sourceUris": [
        "https://googleapis.com/bigtable/projects/playground-2/instances/data-reporting/tables/data-reporting-table"
    ],
    "bigtableOptions": {
        "readRowkeyAsString": "true",
        "columnFamilies" : [
            {
                "familyId": "cf1",
                "onlyReadLatest": "true",
                "columns": [
                    {
                        "qualifierString": "temp",
                        "type": "STRING"
                    },
                    {
                    //the rest of the columns
                ]
            }
        ]
    }
}

The error when doing simple select * query looks like the following:

Error while reading table: datareportingdataset.datareportingtable, error message: Error detected while parsing row starting at position: 2. Error: Data between close double quote (") and field separator.

First I was suspecting some data in the BigTable but when I deleted everything from there the error still occurs. I have found out that it must be something with the json file itself as when I move the "sourceFormats" down a few lines the error changes position reported. What am I doing wrong here?

like image 376
Kris Avatar asked Oct 02 '19 04:10

Kris


2 Answers

just reproduced your case and found the exact same error. It looks to me like when you run the bq mk command it isn't extracting any data at all.

As a workarroud I would suggest you to run a Dataflow job to extract your data to Cloud Storage as an .avro file and then import your data into a dataset in Bigquery.

like image 176
Chris32 Avatar answered Oct 22 '22 13:10

Chris32


I think I found the problem as I was able to reproduce it. The error message is confusing but, as documented here:

You must create the JSON schema file manually, and it must be on your local machine. Referencing a JSON schema file stored in Cloud Storage or in Google Drive is not supported.

I did some tests with the Bigtable quickstart and it worked well for me:

bq query "SELECT * FROM DATASET.datareportingtable"
Waiting on JOB_ID ... (3s) Current status: DONE   
+--------+-----------------+---------------------------+-----------------------+
| rowkey | cf1_column_name | cf1_column_cell_timestamp | cf1_column_cell_value |
+--------+-----------------+---------------------------+-----------------------+
| r1     | c1              |       2019-10-15 18:15:04 | test-value            |
+--------+-----------------+---------------------------+-----------------------+

The only thing I did different was using a local path as in:

--external_table_definition=big-table-definition.json

Changing this back to:

--external_table_definition=gs://$BUCKET/big-table-definition.json

And I got the same error:

bq query "SELECT * FROM DATASET.datareportingtable2"
Waiting on JOB_ID ... (0s) Current status: DONE   
BigQuery error in query operation: Error processing job 'PROJECT:JOB_ID': Error while reading table: DATASET.datareportingtable2, error message: Error detected while parsing row starting at
position: 2. Error: Data between close double quote (") and field separator.
like image 22
Guillem Xercavins Avatar answered Oct 22 '22 12:10

Guillem Xercavins