Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bigquery failed to parse input string as TIMESTAMP

I'm trying to load a csv from Google Cloud Storage into Bigquery using schema autodetect.

However I'm getting stumped by a parsing error on one of my columns. I'm perplexed why bigquery can't parse the field. In the documentation, it should be able to parse fields that look like YYYY-MM-DD HH:MM:SS.SSSSSS (which is exactly what my BQInsertTimeUTC column is).

Here's my code:

from google.cloud import bigquery
from google.oauth2 import service_account

project_id = "<my_project_id>"
table_name = "<my_table_name>"
gs_link = "gs://<my_bucket_id>/my_file.csv"

creds = service_account.Credentials.from_service_account_info(gcs_creds)
bq_client = bigquery.Client(project=project_id, credentials=creds)
dataset_ref = bq_client.dataset(<my_dataset_id>)


# create job_config object
job_config = bigquery.LoadJobConfig(
    autodetect=True,
    skip_leading_rows=1,
    source_format="CSV",
    write_disposition="WRITE_TRUNCATE",
)

# prepare the load_job
load_job = bq_client.load_table_from_uri(
    gs_link,
    dataset_ref.table(table_name),
    job_config=job_config,
)

# execute the load_job
result = load_job.result()

Error Message:

Could not parse '2021-07-07 23:10:47.989155' as TIMESTAMP for field BQInsertTimeUTC (position 4) starting at location 64  with message 'Failed to parse input string "2021-07-07 23:10:47.989155"'

And here's the csv file that is living in GCS:

first_name, last_name, date, number_col, BQInsertTimeUTC, ModifiedBy
lisa, simpson, 1/2/2020T12:00:00, 2, 2021-07-07 23:10:47.989155, tim
bart, simpson, 1/2/2020T12:00:00, 3, 2021-07-07 23:10:47.989155, tim
maggie, simpson, 1/2/2020T12:00:00, 4, 2021-07-07 23:10:47.989155, tim
marge, simpson, 1/2/2020T12:00:00, 5, 2021-07-07 23:10:47.989155, tim
homer, simpson, 1/3/2020T12:00:00, 6, 2021-07-07 23:10:47.989155, tim

enter image description here

like image 537
Tim Estes Avatar asked Apr 23 '26 21:04

Tim Estes


1 Answers

Loading CSV files to BigQuery assumes that all the timestamp fields are going to follow the same format. In your CSV file, since the first timestamp value is "1/2/2020T12:00:00" so it is going to consider the timestamp format that the CSV file uses is [M]M-[D]D-YYYYT[H]H:[M]M:[S]S[.F]][time zone].

Therefore, it complains that the value "2021-07-07 23:10:47.989155" could not be parsed. If you change "2021-07-07 23:10:47.989155" to "7/7/2021T23:10:47.989155", it will work.

To fix this, you can either

  1. Create a table with date column's type and BQInsertTimeUTC column's type as STRING. Load the CSV into it. And then expose a view which will have the expected TIMESTAMP column types for date and BQInsertTimeUTC, using SQL to transform the data from the base table.
  2. Open the CSV file and transform either the "date" values or "BQInsertTimeUTC" values to make their formats consistent.

By the way, the CSV sample you pasted here has extra space after the delimiter ",".

Working version:

first_name,last_name,date,number_col,BQInsertTimeUTC,ModifiedBy
lisa,simpson,1/2/2020T12:00:00,7/7/2021T23:10:47.989155,tim
bart,simpson,1/2/2020T12:00:00,3,7/7/2021T23:10:47.989155,tim
maggie,simpson,1/2/2020T12:00:00,4,7/7/2021T23:10:47.989155,tim
marge,simpson,1/2/2020T12:00:00,5,7/7/2021T23:10:47.989155,tim
homer,simpson,1/3/2020T12:00:00,6,7/7/2021T23:10:47.989155,tim
like image 128
Ryan Yuan Avatar answered Apr 25 '26 11:04

Ryan Yuan



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!