I'm trying to load a csv from Google Cloud Storage into Bigquery using schema autodetect.
However I'm getting stumped by a parsing error on one of my columns. I'm perplexed why bigquery can't parse the field. In the documentation, it should be able to parse fields that look like YYYY-MM-DD HH:MM:SS.SSSSSS (which is exactly what my BQInsertTimeUTC column is).
Here's my code:
from google.cloud import bigquery
from google.oauth2 import service_account
project_id = "<my_project_id>"
table_name = "<my_table_name>"
gs_link = "gs://<my_bucket_id>/my_file.csv"
creds = service_account.Credentials.from_service_account_info(gcs_creds)
bq_client = bigquery.Client(project=project_id, credentials=creds)
dataset_ref = bq_client.dataset(<my_dataset_id>)
# create job_config object
job_config = bigquery.LoadJobConfig(
autodetect=True,
skip_leading_rows=1,
source_format="CSV",
write_disposition="WRITE_TRUNCATE",
)
# prepare the load_job
load_job = bq_client.load_table_from_uri(
gs_link,
dataset_ref.table(table_name),
job_config=job_config,
)
# execute the load_job
result = load_job.result()
Error Message:
Could not parse '2021-07-07 23:10:47.989155' as TIMESTAMP for field BQInsertTimeUTC (position 4) starting at location 64 with message 'Failed to parse input string "2021-07-07 23:10:47.989155"'
And here's the csv file that is living in GCS:
first_name, last_name, date, number_col, BQInsertTimeUTC, ModifiedBy
lisa, simpson, 1/2/2020T12:00:00, 2, 2021-07-07 23:10:47.989155, tim
bart, simpson, 1/2/2020T12:00:00, 3, 2021-07-07 23:10:47.989155, tim
maggie, simpson, 1/2/2020T12:00:00, 4, 2021-07-07 23:10:47.989155, tim
marge, simpson, 1/2/2020T12:00:00, 5, 2021-07-07 23:10:47.989155, tim
homer, simpson, 1/3/2020T12:00:00, 6, 2021-07-07 23:10:47.989155, tim

Loading CSV files to BigQuery assumes that all the timestamp fields are going to follow the same format. In your CSV file, since the first timestamp value is "1/2/2020T12:00:00" so it is going to consider the timestamp format that the CSV file uses is [M]M-[D]D-YYYYT[H]H:[M]M:[S]S[.F]][time zone].
Therefore, it complains that the value "2021-07-07 23:10:47.989155" could not be parsed. If you change "2021-07-07 23:10:47.989155" to "7/7/2021T23:10:47.989155", it will work.
To fix this, you can either
By the way, the CSV sample you pasted here has extra space after the delimiter ",".
Working version:
first_name,last_name,date,number_col,BQInsertTimeUTC,ModifiedBy
lisa,simpson,1/2/2020T12:00:00,7/7/2021T23:10:47.989155,tim
bart,simpson,1/2/2020T12:00:00,3,7/7/2021T23:10:47.989155,tim
maggie,simpson,1/2/2020T12:00:00,4,7/7/2021T23:10:47.989155,tim
marge,simpson,1/2/2020T12:00:00,5,7/7/2021T23:10:47.989155,tim
homer,simpson,1/3/2020T12:00:00,6,7/7/2021T23:10:47.989155,tim
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With