Is there a way to load a JSON file from local file system to BigQuery using Google BigQuery Client API?
All the options I found are:
1- Streaming the records one by one.
2- Loading JSON data from GCS.
3- Using raw POST requests to load the JSON (i.e. not through Google Client API).
I'm assuming from the python tag that you want to do this from python. There is a load example here that loads data from a local file (it uses CSV, but it is easy to adapt it to JSON... there is another json example in the same directory).
The basic flow is:
# Load configuration with the destination specified.
load_config = {
'destinationTable': {
'projectId': PROJECT_ID,
'datasetId': DATASET_ID,
'tableId': TABLE_ID
}
}
load_config['schema'] = {
'fields': [
{'name':'string_f', 'type':'STRING'},
{'name':'boolean_f', 'type':'BOOLEAN'},
{'name':'integer_f', 'type':'INTEGER'},
{'name':'float_f', 'type':'FLOAT'},
{'name':'timestamp_f', 'type':'TIMESTAMP'}
]
}
load_config['sourceFormat'] = 'NEWLINE_DELIMITED_JSON'
# This tells it to perform a resumable upload of a local file
# called 'foo.json'
upload = MediaFileUpload('foo.json',
mimetype='application/octet-stream',
# This enables resumable uploads.
resumable=True)
start = time.time()
job_id = 'job_%d' % start
# Create the job.
result = jobs.insert(
projectId=project_id,
body={
'jobReference': {
'jobId': job_id
},
'configuration': {
'load': load
}
},
media_body=upload).execute()
# Then you'd also want to wait for the result and check the status. (check out
# the example at the link for more info).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With