We are trying to upload files into Google Cloud Storage before moving them into BigQuery, but we are often facing '500 Internal Server Error' or '410 Gone' (raw messages below) during some uploads.
We are using the official SDK and have added retry with exponential backoff but the errors are always here. Do you have any advise please ?
Here is how we upload (scala) :
val credential = new GoogleCredential().setAccessToken(accessToken)
val requestInitializer = new HttpRequestInitializer() {
def initialize(request: HttpRequest): Unit = {
credential.initialize(request)
// to avoid read timed out exception
request.setConnectTimeout(200000)
request.setReadTimeout(200000)
request.setIOExceptionHandler(new
HttpBackOffIOExceptionHandler(new ExponentialBackOff()))
request.setUnsuccessfulResponseHandler(new
HttpBackOffUnsuccessfulResponseHandler(new ExponentialBackOff()))
}
}
val storage = new Storage.Builder(
new NetHttpTransport,
JacksonFactory.getDefaultInstance,
requestInitializer
).setApplicationName("MyAppHere").build
val objectMetadata = new StorageObject()
.setBucket(bucketName)
.setName(distantFileName)
val isc = new InputStreamContent("binary/octet-stream", fis)
val length = isc.getLength
val insertObject = storage.objects().insert(bucketName, objectMetadata, isc)
// For small files, you may wish to call setDirectUploadEnabled(true), to
// reduce the number of HTTP requests made to the server.
if (length > 0 && length <= 2 * 1000 * 1000 /* 2MB */ ) {
insertObject.getMediaHttpUploader.setDirectUploadEnabled(true)
}
insertObject.execute()
Our scala dependancies :
"com.google.api-client" % "google-api-client" % "1.18.0-rc",
"com.google.api-client" % "google-api-client-jackson2" % "1.18.0-rc",
"com.google.apis" % "google-api-services-bigquery" % "v2-rev142-1.18.0-rc",
"com.google.apis" % "google-api-services-storage" % "v1-rev1-1.18.0-rc",
"com.google.http-client" % "google-http-client" % "1.18.0-rc",
"com.google.oauth-client" % "google-oauth-client" % "1.18.0-rc"
Raw SDK error responses :
500 Internal Server Error
{
"code" : 500,
"errors" : [ {
"domain" : "global",
"message" : "Backend Error",
"reason" : "backendError"
} ],
"message" : "Backend Error"
}
410 Gone
{
"code" : 500,
"errors" : [ {
"domain" : "global",
"message" : "Backend Error",
"reason" : "backendError"
} ],
"message" : "Backend Error"
}
Cloud Storage provides the Signed URL feature to let individual end users perform specific actions. Signed URL makes it possible to generate temporary credentials valid only for a specific end user to securely upload a file. The Google Cloud official client library makes it easy to generate a Signed URL.
Select All Settings > Raw Data Export > CSV Upload. Select Google Cloud Storage from the dropdown menu. Upload your Service Account Key credential file. This is the JSON file created in the Google Cloud Console. Enter your Google Cloud Storage bucket name.
A pythonic file-system interface to Google Cloud Storage. This software is beta, use at your own risk. Please file issues and requests on github and we welcome pull requests.
Every Backend error
should be handled with an exponential retry, as there might be service problems.
If the error still persists after let's say 10 hours then you should contact the support in order to provide you 1:1 help to your problem.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With