tl;dr A Drive API call returns a failure status (403) even though the request was successfully processed.
I insert 100 files in a loop. For this test I have DISABLED backoff and retry, so if an insert fails with a 403, I ignore it and proceed with the next file. Out of 100 files, I get 63 403 rate limit exceptions.
However, on checking Drive, of those 63 failures, 3 actually succeeded, ie. the file was created on drive. Had I done the usual backoff and retry, I would have ended up with duplicated inserts. This confirms the behaviour I was seeing with backoff-retry enabled, ie. from my 100 file test, I am consistently seeing 3-4 duplicate insertions.
It smells like there is an asynchronous connection between the API endpoint server and the Drive storage servers which is causing non-deterministic results, especially on high volume writes.
Since this means I can't rely on "403 rate limit" to throttle my inserts, I need to know what is a safe insert rate so as not to trigger these timing bugs.
Running the code below, gives ...
Summary...
File insert attempts (a) = 100
rate limit errors (b) = 31
expected number of files (a-b) = 69
Actual number of files = 73
code...
package com.cnw.test.servlets;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import com.google.api.client.auth.oauth2.Credential;
import com.google.api.client.googleapis.json.GoogleJsonError;
import com.google.api.client.googleapis.json.GoogleJsonResponseException;
import com.google.api.client.http.javanet.NetHttpTransport;
import com.google.api.client.json.jackson.JacksonFactory;
import com.google.api.services.drive.Drive;
import com.google.api.services.drive.model.ChildList;
import com.google.api.services.drive.model.File;
import com.google.api.services.drive.model.File.Labels;
import com.google.api.services.drive.model.ParentReference;
import couk.cleverthinking.cnw.oauth.CredentialMediatorB;
import couk.cleverthinking.cnw.oauth.CredentialMediatorB.InvalidClientSecretsException;
@SuppressWarnings("serial")
/**
*
* AppEngine servlet to demonstrate that Drive IS performing an insert despite throwing a 403 rate limit exception.
*
* All it does is create a folder, then loop to create x files. Any 403 rate limit exceptions are counted.
* At the end, compare the expected number of file (attempted - 403) vs. the actual.
* In a run of 100 files, I consistently see between 1 and 3 more files than expected, ie. despite throwing a 403 rate limit,
* Drive *sometimes* creates the file anyway.
*
* To run this, you will need to ...
* 1) enter an APPNAME above
* 2) enter a google user id above
* 3) Have a valid stored credential for that user
*
* (2) and (3) can be replaced by a manually constructed Credential
*
* Your test must generate rate limit errors, so if you have a very slow connection, you might need to run 2 or 3 in parallel.
* I run the test on a medium speed connection and I see 403 rate limits after 30 or so inserts.
* Creating 100 files consistently exposes the problem.
*
*/
public class Hack extends HttpServlet {
private final String APPNAME = "MyApp"; // ENTER YOUR APP NAME
private final String GOOGLE_USER_ID_TO_FETCH_CREDENTIAL = "11222222222222222222222"; //ENTER YOUR GOOGLE USER ID
@Override
public void doGet(HttpServletRequest request, HttpServletResponse response) throws IOException {
/*
* set up the counters
*/
// I run this as a servlet, so I get the number of files from the request URL
int numFiles = Integer.parseInt(request.getParameter("numfiles"));
int fileCount = 0;
int ratelimitCount = 0;
/*
* Load the Credential
*/
CredentialMediatorB cmb = null;
try {
cmb = new CredentialMediatorB(request);
} catch (InvalidClientSecretsException e) {
e.printStackTrace();
}
// this fetches a stored credential, you might choose to construct one manually
Credential credential = cmb.getStoredCredential(GOOGLE_USER_ID_TO_FETCH_CREDENTIAL);
/*
* Use the credential to create a drive service
*/
Drive driveService = new Drive.Builder(new NetHttpTransport(), new JacksonFactory(), credential).setApplicationName(APPNAME).build();
/*
* make a parent folder to make it easier to count the files and delete them after the test
*/
File folderParent = new File();
folderParent.setTitle("403parentfolder-" + numFiles);
folderParent.setMimeType("application/vnd.google-apps.folder");
folderParent.setParents(Arrays.asList(new ParentReference().setId("root")));
folderParent.setLabels(new Labels().setHidden(false));
driveService.files().list().execute();
folderParent = driveService.files().insert(folderParent).execute();
System.out.println("folder made with id = " + folderParent.getId());
/*
* store the parent folder id in a parent array for use by each child file
*/
List<ParentReference> parents = new ArrayList<ParentReference>();
parents.add(new ParentReference().setId(folderParent.getId()));
/*
* loop for each file
*/
for (fileCount = 0; fileCount < numFiles; fileCount++) {
/*
* make a File object for the insert
*/
File file = new File();
file.setTitle("testfile-" + (fileCount+1));
file.setParents(parents);
file.setDescription("description");
file.setMimeType("text/html");
try {
System.out.println("making file "+fileCount + " of "+numFiles);
// call the drive service insert execute method
driveService.files().insert(file).setConvert(false).execute();
} catch (GoogleJsonResponseException e) {
GoogleJsonError error = e.getDetails();
// look for rate errors and count them. Normally one would expo-backoff here, but this is to demonstrate that despite
// the 403, the file DID get created
if (error.getCode() == 403 && error.getMessage().toLowerCase().contains("rate limit")) {
System.out.println("rate limit exception on file " + fileCount + " of "+numFiles);
// increment a count of rate limit errors
ratelimitCount++;
} else {
// just in case there is a different exception thrown
System.out.println("[DbSA465] Error message: " + error.getCode() + " " + error.getMessage());
}
}
}
/*
* all done. get the children of the folder to see how many files were actually created
*/
ChildList children = driveService.children().list(folderParent.getId()).execute();
/*
* and the winner is ...
*/
System.out.println("\nSummary...");
System.out.println("File insert attempts (a) = " + numFiles);
System.out.println("rate limit errors (b) = " + ratelimitCount);
System.out.println("expected number of files (a-b) = " + (numFiles - ratelimitCount));
System.out.println("Actual number of files = " + children.getItems().size() + " NB. There is a limit of 100 children in a single page, so if you're expecting more than 100, need to follow nextPageToken");
}
}
I'm assuming you're trying to do Parallel downloads...
This may not be an answer you're looking for, but this is what I've experienced in my interactions with google drive api. I use C#, so it's a bit different, but maybe it'll help.
I had to set a specific amount of threads to run at one time. If I let my program run all 100 entries at one time as separate threads, I run into the rate limit error as well.
I don't know well at all, but in my C# program, I run 3 threads (definable by the user, 3 is default)
opts = new ParallelOptions { MaxDegreeOfParallelism = 3 };
var checkforfinished =
Parallel.ForEach(lstBackupUsers.Items.Cast<ListViewItem>(), opts, name => {
{ // my logic code here }
I did a quick search and found that Java 8 (not sure if that's what you're using) supports Parallel().forEach(), maybe that'd help you. The resource I found for this is at: http://radar.oreilly.com/2015/02/java-8-streams-api-and-parallelism.html
Hope this helps, taking my turns trying to help others on SO as people have helped me!
There is no answer to this problem since it's a confirmed Drive bug.
For anybody experiencing the problem (which is anybody doing bulk inserts), the workaround is the following pseudo code...
as part of the File json for the insert,
include a synthetic ID as a custom property.
Eg file.setProperties("myID", filename+count++) // NB store the file object in a map/array
If an insert receives a 403, check if the insert actually succeeded
with a query on the synthetic ID.
Eg service.files().list().setQ("appProperties has { key='myID' and value='filenamecount' }") // where filenamecount is from the stored file object
If file.list returns a hit, the insert succeeded and no further action is required.
If there are zero results,
the 403 was accurate and the insert needs to be requeued.
Note that the ONLY safe way to do bulk inserts is via a queue
which you throttle in response to receiving 403 errors.
Do not implement a simplistic exponential backoff.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With